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INTRODUCTION 


In  this  project  we  have  worked  towards  a  theory  of  representation  and  computation  in 
decision-  and  game -theoretic  models.  In  many  domains,  such  as  the  ones  faced  by  the 
military,  or  in  e-commerce,  task  allocation  (and  other  multi-agent  protocols)  is  carried  out 
in  non-cooperative  environments.  Game  theory  deals  with  decision  making  in  non- 
cooperative  environments,  but  ignores  fault-tolerance  and  related  computational  issues. 
Conversely,  most  work  done  in  computer  science  ignores  the  issue  of  decentralized 
decision  making  by  self-interested  parties.  In  order  to  effectively  and  efficiently  address 
the  problems  such  as  task  allocation,  work  in  computer  science  and  AI  should  be 
integrated  with  work  in  game  theory.  Naturally,  this  suggests  major  computational 
challenges  that  should  be  addressed  as  well. 

We  have  isolated  several  central  and  complementary  lines  of  research,  which  are 
essential  for  establishing  a  general  synthesized  theory  of  representation  and  computation 
in  decision  and  game-theoretic  models,  and  made  significant  progress  in  each  of  those 
directions. 

1.  Computational  mechanism  design: 

An  important  aspect  of  our  research  concerns  the  use  of  auctions  for  efficient  resource 
(or,  symmetrically,  task)  allocation.  This  is  an  area  we  and  others  have  worked  on  for 
several  years  now,  and  is  becoming  quite  popular  in  computer  science.  Auctions  fall 
under  the  umbrella  of  mechanism  design  in  game  theory,  which  is  the  art  and  science  of 
designing  interactions  among  agents  that  lead  -  by  virtue  of  the  agents  following  their 
own  best  interest  -  to  certain  desired  outcomes.  In  an  alternative  view,  mechanism  design 
injects  game -theoretic  notations  necessary  for  dealing  with  self-interesting  agents  into 
classical  computational  settings. 

la.  Fair  mechanism  design:  Fair  Imposition.  [1] 

Traditional  work  in  auction  theory  and  economic  mechanism  design  does  not  concentrate 
on  whether  the  outcome  is  fair  in  any  sense;  the  usual  yardsticks  are  efficiency  and 
revenue  maximization  (or  cost  minimization).  We  investigated  this  new  direction,  so  that 
for  example  if  the  military  decides  to  adopt  a  market  mechanism  to  assign  transportation 
tasks  to  civilian  carriers  then  the  outcome  will  be  not  only  efficient  but  also  fair.  We 
defined  the  problem  precisely,  present  solution  criteria  for  these  problems  (the  central  of 
which  is  called  k-efficiency ),  and  present  both  positive  results  (in  the  form  of  concrete 
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protocols)  and  negative  results  (in  the  form  of  impossibility  theorems)  concerning  these 
criteria. 


lb.  Fault-tolerant  mechanism  design.  [2] 

Traditional  work  on  mechanism  design  assumes  that  agents  have  perfect  control  over 
their  actions  and  their  success.  In  contrast,  a  long-standing  interest  in  computer  science  is 
in  controlling  for  failures  of  various  kinds  -  a  computer  crashing,  a  communication  link 
going  down.  We  introduced  the  notion  of  Fault  Tolerant  Mechanism  Design,  which 
consists  of  injecting  the  computer  science  notion  of  fault  tolerance  into  the  standard  game 
theoretic  framework  of  Mechanism  Design.  Specifically,  we  defined  the  problem  of  task 
allocation  in  the  context  in  which  not  only  the  agents'  costs  are  private  information,  but 
so  are  their  probabilities  of  failure.  For  several  different  instances  of  this  setting  we 
obtained  technical  results,  including  positive  ones  in  the  form  of  specific  mechanisms 
with  provable  desired  properties,  and  negative  ones  in  the  form  of  impossibility  theorems. 

lc.  Collusion  in  first-price  auctions.  [3] 

We  introduced  a  class  of  mechanisms,  called  bidding  clubs,  which  allow  agents  to 
coordinate  their  bidding  in  auctions.  We  modeled  this  setting  as  a  Bayesian  game, 
including  agents’  choices  of  whether  or  not  to  accept  a  bidding  club’s  invitation.  It  turns 
out,  that  for  this  setting  in  first-price  auctions  there  exists  a  Bayes-Nash  equilibrium 
where  agents  choose  to  participate  in  bidding  clubs  when  invited  and  truthfully  declare 
their  valuations  to  the  coordinator.  Furthermore,  the  existence  of  bidding  clubs  benefits 
all  agents  (including  both  agents  inside  and  outside  of  a  bidding  club)  in  several  different 
senses. 


Id.  Practical  job-scheduling  mechanisms.  [4] 

We  considered  the  problem  of  online  real-time  scheduling  of  jobs  on  a  single  processor  in 
an  economic  setting,  in  which  each  job  is  released  to  a  separate,  self-interested  agent.  The 
agent  can  then  delay  releasing  the  job  to  the  algorithm,  inflate  is  length,  and  declare  an 
arbitrary  value  and  deadline  for  the  job,  while  the  center  determines  not  only  the 
schedule,  but  also  the  amount  each  agent  must  pay.  For  the  resulting  mechanism  design 
problem,  we  developed  a  mechanism  that  addresses  each  of  these  incentive  issues,  while 
only  increasing  by  one  the  competitive  ratio  previously  known  for  the  non-strategic 
setting.  We  also  proved  a  matching  lower  bound  for  deterministic  mechanisms  that  never 
pay  the  agents. 


le.  Non-cooperative  computation  [5] 

We  introduced  the  framework  of  non-cooperative  computation  (NCC).  We  considered 
functions  whose  inputs  are  held  by  separate,  self-interested  agents.  We  also  considered 
four  components  of  each  agent's  utility  function:  (a)  the  wish  to  know  the  correct  value  of 


2 


the  function,  (b)  the  wish  to  prevent  others  from  knowing  it,  (c)  the  wish  to  prevent  others 
from  knowing  one's  own  private  input,  and  (d)  the  wish  to  know  other  agents'  private 
inputs.  We  provided  an  exhaustive  game-theoretic  analysis  of  all  24  possible 
lexicographic  orderings  among  these  four  considerations,  for  the  case  of  Boolean 
functions  (mercifully,  these  24  cases  collapse  to  four).  In  each  case  we  identified  the  class 
of  functions  for  which  there  exists  an  incentive-compatible  mechanism  for  computing  the 
function. 

This  work  has  applications  both  to  cryptographic  settings  and  to  utility  elicitation.  If  a 
function  is  not  computable  in  the  NCC  setting,  then  no  cryptographic  protocol  can  be 
developed  for  interactions  between  self-interested  agents.  One  application  to  utility 
elicitation  is  when  the  input  to  the  function  is  each  agent’s  utility  for  the  events  it  has 
experienced  (e.g.,  movies  seen  or  books  read).  The  function  that  each  agent  is  trying  to 
compute  could  is  then  a  recommendation  based  on  the  opinions  of  all  other  agents.  The 
key  point  is  that,  in  order  to  construct  useful  protocols  for  eliciting  utility,  one  has  to  be 
aware  of  agents’  meta-preferences  expressing  their  privacy,  their  desire  to  deceive,  etc. 
Both  of  these  applications  fall  into  a  new  area  that  we  tenned  informational  mechanism 
design,  in  which  the  utility  functions  of  the  self-interested  agents  depend  only  on  the  state 
of  knowledge  of  all  agents  at  the  end  of  the  mechanism. 

If.  Multi-party  computation.  [6] 

We  studied  the  problem  of  using  a  “busy  center”  to  design  a  mechanism  that  encourages 
rational  actors  to  play  a  game  of  complete  information  that  achieves  an  outcome  that  a 
center  prefers  without  involving  the  center  in  the  mechanism  in  any  way  on  the 
equilibrium  path.  We  examined  the  cases  when  agents’  actions  are  both  observable  and 
unobservable  to  the  center.  We  showed  that  “busy  center”  mechanisms  can  be 
transfonned  into  agent-resolved  mechanisms  with  the  assistance  of  a  trusted  third-party 
hank  which  is  ignorant  of  the  mechanism  being  executed.  These  ideas  are  also  applicable 
in  a  constant-round  multi-party  rational  exchange  protocol. 

2.  Multi-agent  adaptation: 

Many,  indeed  most,  multiagent  interactions  do  not  admit  easy  equilibrium  analysis,  and 
indeed  the  harder  the  analysis  the  less  clear  it  is  that  the  concept  of  equilibrium  is  relevant 
either  descriptively  or  prescriptively.  This  is  particular  true  of  situations  that  take  place 
over  long  period  times.  What  is  more  common  in  complex  situations  is  for  agents  to 
consider  a  simple  version,  or  aspect,  of  the  game  being  played,  but  learn  over  time  to 
improve  their  model  and  strategies.  Indeed,  effective  adaptation  is  the  topic  of  many 
research  activities  in  many  academic  disciplines.  Most  of  the  work  concerns  single-agent 
learning,  in  the  context  of  a  passive  environment;  the  environment  may  be  stochastic  and 
unpredictable,  but  it  is  not  "strategic"  which  its  own  set  of  objectives.  Things  become 
even  more  challenging  when  multiple  agents  adapt  simultaneously,  since  one  agent's 
adaptation  process  impacts  the  other  agent’s  optimal  adaptation.  Indeed,  learning  has  been 
a  very  active  area  within  game  theory,  and  very  recently  also  within  computer  science 
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(mostly  AI).  We  have  tried  to  extend  Al-style  reinforcement  learning  to  the  multiagent 
setting. 


2a.  Planning  in  games. 

We  developed  a  new  anytime  algorithm  for  computing  good  policies  in  many-agent 
systems.  Most  of  the  previous  work  on  multi-agent  systems  tackles  problems  with  either 
two  agents  or  a  very  large  number  of  agents.  We  focused  on  problems  with  intermediate 
number  of  agents.  Our  first  results  show  how  to  solve  the  planning  problem  when  playing 
against  known  rationally  bounded  opponents,  where  the  computation  of  the  optimal 
policy  may  be  intractable  due  to  the  huge  state-space.  We  identified  a  lattice  of  problems 
where  at  one  extreme  we  consider  all  the  opponents  and  compute  the  true  best  response, 
and  at  the  other  extreme  we  consider  none  of  them  and  compute  the  maxmin  strategy.  In 
between  the  two  extremes  we  can  explicitly  reason  about  a  given  subset  of  the  opponents 
and  treat  the  others  as  worst-case  adversaries.  We  identified  a  vertex  on  the  lattice  that 
gives  the  best  policy  given  computational  restrictions.  Based  on  our  experimental  results, 
we  also  proposed  two  ways  of  heuristically  speeding  up  optimal  value  computations. 

2b.  Computing  best-response  strategies  via  sampling.  [7] 

We  investigated  the  problem  of  computing  a  best-response  to  an  opponent’s  strategy, 
when  this  strategy  is  not  known  exactly  but  can  instead  be  sampled.  For  instance,  a 
government  might  capture  some  members  of  a  terrorist  organization  and  learn  samples  of 
their  strategies.  They  can  the  use  this  sample  to  estimate  the  strategies  of  the  remaining 
agents.  A  similar  example  involves  studying  the  code/strategy  of  computer  viruses  and 
using  this  information  to  design  effective  security  against  new  viruses.  We  found 
analytics  results  on  the  number  of  samples  required  in  order  to  approximate  the  optimal 
best  response.  We  then  showed  experimentally  that  convergence  to  the  best  response 
often  occurs  much  more  quickly  than  is  predicted  by  formal  guarantees.  Finally,  we  went 
beyond  so-called  “oblivious  sampling”,  i.e.  we  considered  what  happens  if  the  opponent 
is  aware  that  the  agent  has  taken  the  samples,  if  the  agent  knows  that  the  opponent  is 
aware,  and  so  on  to  higher  levels  of  mutual  modeling. 


3.  Algorithms  and  representations  for  game  theory: 

Game  theory  is  a  very  natural  tool  for  modeling  multiagent  interactions.  In  our  work  on 
mechanism  design  we  have  shown  how  to  inject  incentives  into  computational  settings  in 
order  to  make  agents  behave  the  way  we  want.  In  the  work  on  multiagent  adaptation,  we 
have  studied  how  one  might  devise  artificial  agents  that  act  in  such  environments.  A 
complementary  direction  is  to  study  the  problem  of  automatically,  but  centrally, 
reasoning  about  game-theoretic  settings.  However,  very  little  work  in  computer  science 
has  been  done  on  algorithms  and  representations  for  game-theoretic  problems.  An 
important  part  of  our  research  has  been  to  fill  this  gap. 
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3a.  Dispersion  games.  [8] 


We  generalized  a  notion  of  anti-coordination  games  to  an  arbitrary  number  of  actions 
and  players.  In  such  games  players  prefer  maximally  dispersed  outcomes.  Such  games 
capture  many  natural  problems,  such  as  division  of  roles  within  a  team,  network  load 
balancing  and  resource  allocation,  niche  selection  in  economics.  We  studied  empirically 
and  analytically  behavior  of  agents  in  repeated  versions  of  such  games  using  several 
decentralized  learning  rules. 

3c.  Search  methods  for  finding  Nash  equilibria.  [9] 

Computing  a  sample  Nash  equilibrium  in  a  nonnal  form  game  is  one  of  the  most 
interesting  and  most  poorly  understood  problems  in  computer  science.  The  question  of  its 
complexity  still  remains  open.  The  best  algorithms  currently  known  are  based  on 
complicated  mathematics  and  numerical  methods,  and  yet  come  with  no  known 
guarantees  on  their  running  time.  Surprisingly,  none  of  the  classic  AI  search  methods 
have  been  tried.  We  filled  this  gap,  by  formulating  this  problem  as  a  search  problem,  and 
by  coming  up  with  problem  relaxations  and  useful  heuristics.  Resulting  methods,  while 
extremely  simple,  outperformed  state-of-the-art  Lemke-Howson,  Simplicial  Subdivision, 
and  Govindan-Wilson  algorithms  by  orders  of  maginuted  on  a  wide  variety  of  realistic 
game  distributions. 

3d.  Game  theory  test  suite.  [10] 

In  order  to  achieve  our  goal  of  a  coherent  theory  of  decision  -  and  game-theoretic 
models,  it  is  essential  to  have  a  solid  collection  of  interesting  settings  on  which  the  new 
ideas  can  be  evaluated.  In  our  case,  many  of  such  settings  can  be  modeled  as 
simultaneous-action  games.  We  have  combed  through  the  literature  in  game  theory, 
economics,  CS,  AI,  psychology,  and  political  science,  and  compiled  an  extensive 
database  of  classes  of  games  in  normal  form  that  researches  have  found  to  be  interesting 
and  useful.  We  have  elucidated  relations  between  many  such  classes  by  organizing  them 
into  a  unified  taxonomy  of  games.  We  also  implemented  generators  for  many  of  these 
games  in  a  test-suite  called  GAMUT.  GAMUT  has  already  become  the  definitive  test  suit 
for  testing  game-theoretic  algorithms  and  agents. 

3e.  Multi-agent  target  surveillance. 

We  developed  a  novel  approach  to  solve  a  real-time  multiagent  target  surveillance 
problem.  The  problem  consists  of  a  set  of  agents  and  a  set  of  targets  on  a  map,  each  of 
which  must  be  visited  by  an  agent.  The  objective  function  is  the  total  time  spent  by  the 
agents.  There  exists  software  to  solve  this  problem  optimally;  however,  in  general  the 
computational  time  is  large.  On  the  other  hand,  there  are  several  different  algorithms 
which  quickly  produce  an  approximate  solution.  Our  overall  objective  is  to  minimize  the 
sum  of  the  computation  time  plus  the  execution  time.  Our  approach  is  to  first  execute  the 
approximate  solutions.  Then,  using  our  empirical  hardness  methodology,  we  predict  both 
the  improvement  in  solution  quality  that  would  result  from  finding  the  optimal  solution, 
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in  addition  to  the  time  that  it  would  take  to  find  this  solution.  Only  if  it  is  worth  the 
computation  time,  do  we  execute  the  optimal  solver.  Experimentally,  this  approach 
outperformed  both  always  using  the  approximate  solution,  and  always  finding  the  optimal 
solution. 

3f.  Representation  and  complexity  of  cooperative  games.  [11] 

While  most  of  our  effort  has  focused  on  non-cooperative  games,  cooperative  game  theory 
is  an  equally  important,  but  much  less  understood  from  the  computational  point  of  view. 
Cooperative  game  theory  studies  the  interaction  of  agents  that  have  the  capability  of 
enforcing  contracts,  and  has  strong  ties  to  basic  microeconomic  theory.  We  have 
introduced  several  natural  compact  representations  of  coalitional  games,  and  studied  the 
complexity  of  reasoning  about  games  in  these  representations. 

4.  Empirical  hardness  models: 

Many  natural  computational  problems  associated  with  game  theory  and  economics 
models  turn  out  to  be  NP-hard.  Nevertheless,  it  is  important  to  develop  methods  for 
solving  practical  instances  of  these  problems.  Thus,  it  is  necessary  to  be  able  to 
empirically  understand  factors  that  make  particular  problems  hard  for  existing 
algorithms.  We  have  proposed  a  novel  methodology  for  doing  this.  In  our  methodology, 
we  compute  features  of  hard  problem  instances,  and  then  use  machine-learning, 
specifically,  regression,  techniques  to  construct  statistical  models  of  algorithms  behavior. 
We  have  constructed  such  models  for  algorithms  in  two  domains:  winner-detennination 
problem  for  Combinatorial  Auctions,  and  the  problem  of  determining  satisfiability  of  a 
propositional  formula.  In  both  cases,  we  showed  that  very  accurate  predictions  of 
runtimes  can  be  obtained.  Also,  by  analyzing  learned  models,  we  were  able  to  zero  on 
some  very  specific,  and,  sometimes,  surprising,  sources  of  hardness.  We  have  also  used 
this  methodology  as  a  basis  for  our  multi-agent  target  surveillance  approach  discussed 
below. 

4a.  Algorithm  selection  and  algorithm  portfolios.  [12, 13] 

A  lot  of  the  problems  that  naturally  arise  in  AI  and  game  theory  are  hard.  It  is  often  the 
case,  especially  for  NP-hard  problems,  that  different  algorithms  perfonn  well  on 
completely  different  problem  instances.  Therefore,  in  order  to  obtain  practical  robust 
solution  methods,  several  existing  algorithms  must  be  combined  into  a  single  portfolio. 
The  problem  of  selecting  the  right  algorithm  on  a  per-instance  basis  was  first  recognized 
by  Rice  in  1973.  Surprisingly  little  has  been  done  since,  with  most  people  simply  running 
the  algorithm  that  is  best  on  average.  We  have  demonstrated  that  our  empirical  hardness 
models  can  be  used  to  select  the  right  algorithm  very  effectively.  For  the  winner 
determination  problem,  a  portfolio  that  included  two  additional  inferior  algorithms 
outperformed  the  best  algorithm  by  a  factor  of  3.  Our  SAT  portfolio  performed  very 
favorably  in  SAT  solver  competition,  being  the  only  solver  that  placed  near  the  top  in 
more  than  one  competition  track. 
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4b.  Hard  instance  distributions.  [12, 13] 


In  light  of  the  success  of  algorithm  portfolios,  it  becomes  extremely  important  to  come  up 
with  new  hard  distributions  of  problem  instances,  since  new  algorithms  are  only  useful  to 
the  portfolio  if  they  perform  well  where  the  current  portfolio  performs  poorly.  We  have 
showed  how  our  hardness  models,  in  conjunction  with  rejection  sampling,  can  be  used  to 
generate  hard  instances.  We  have  tested  this  approach  on  CATS  -  the  standard  generator 
for  combinatorial  auctions  WDP  instances.  Using  our  models  to  guide  instance 
generation,  even  CATS’  easiest  distributions,  such  as  matching  and  scheduling,  now 
routinely  output  instances  that  are  much  harder  than  anything  that  we’ve  observed 
previously. 
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Abstract 

We  introduce  a  new  notion,  related  to  auctions 
and  mechanism  design,  called  fair  imposition.1 
In  this  setting  a  center  wishes  to  fairly  and  effi¬ 
ciently  allocate  tasks  among  a  set  of  agents,  not 
knowing  their  cost  structure.  As  in  the  study  of 
auctions,  the  main  obstacle  to  overcome  is  the 
self-interest  of  the  agents,  which  will  in  gen¬ 
eral  cause  them  to  hide  their  true  costs.  Un¬ 
like  the  auction  setting,  however,  here  the  cen¬ 
ter  has  the  power  to  impose  arbitrary  behav¬ 
ior  on  the  agents,  and  furthermore  wishes  to 
distribute  the  cost  as  fairly  as  possible  among 
them.  We  define  the  problem  precisely,  present 
solution  criteria  for  these  problems  (the  cen¬ 
tral  of  which  is  called  k-efficiency),  and  present 
both  positive  results  (in  the  form  of  concrete 
protocols)  and  negative  results  (in  the  form  of 
impossibility  theorems)  concerning  these  crite¬ 
ria. 


1  Introduction 

Allocation  of  resources  or,  isomorphically,  of  tasks  is 
among  the  fundamental  problems  in  computer  science, 
operations  research,  economics,  and  other  scientific  and 
technological  disciplines.  In  a  centralized  task  allocation 
problem  there  is  a  center  whose  aim  is  to  allocate  one  or 
more  tasks  among  several  available  agents  (be  they  ma¬ 
chines,  processors,  servers,  employees,  companies,  etc.). 
Several  aspects  of  the  situation  strongly  impact  the  prob¬ 
lem.  The  two  aspects  we  focus  on  are: 

1.  The  information  available  to  the  center  about,  the 
agents’  costs  for  performing  the  tasks. 

2.  The  center’s  ability  to  impose  tasks  and  payments 
on  the  agents 


*The  permanent  address  of  the  second  author  is:  Faculty 
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In  classical  work  on  centralized  optimization  in  CS  and 
OR  the  assumption  is  that  the  center  has  perfect  infor¬ 
mation  about,  and  perfect,  control  over  the  agents;  usu¬ 
ally  payments  don’t,  figure  in  at.  all,  and  the  center  im¬ 
poses  on  the  agents  an  optimal  protocol  computed  by 
the  center.  In  economics  and  game  theory  for  the  most, 
part,  the  opposite  obtains,  namely  the  center  (auctioneer, 
procurer)  has  no  knowledge  of  the  agents’  private  infor¬ 
mation,  and  furthermore  has  no  power  to  enforce  any 
behavior  on  the  agents.  Indeed,  the  individual  freedom 
to  decide  whether  to  transact,  and  under  wha.t.  terms  to 
do  so  lies  at.  the  core  of  wha.t.  one  usually  understands 
a.  ‘market.’  to  mean.  For  this  reason,  payments  are  the 
primary  means  of  inducing  agents  to  exhibit,  any  sort,  of 
behavior  in  such  a.  setting. 

We  introduce  an  intermediate  setting  in  which  the  cen¬ 
ter  has  full  power  over  the  agents,  but.  no  access  to  their 
private  information.  Wit. furthermore  assume  that,  the 
center  is  a.  benign  dictator,  which  wishes  not.  only  to 
achieve  the  tasks  but.  to  do  so  in  a.  way  that,  is  socially 
fair.  The  problem  is  that,  agents  will  in  general  not.  vol¬ 
unteer  correct,  information  that,  would  allow  the  center 
to  achieve  these  objectives,  and  thus  the  center  must,  re¬ 
sort.  to  incentive  engineering  of  the  sort,  encountered  in 
mechanism  design  in  game  theory. 

This  broad  category  of  problems  was  motivated  by 
a.  specific  problem  encountered  by  the  military,  in  the 
Virtual  Transportation  Company  [VTC]  project.  [God¬ 
frey  and  Mifflin,  2000].  In  most,  if  not.  all  democratic 
countries  the  state  has  the  power  at.  times  of  emergency 
to  commandeer  aircraft,  and  other  resources  required  to 
deal  with  the  crisis.  Of  course,  in  a.  democratic  country 
the  state  recognizes  the  rights  of  companies  such  as  air¬ 
line  carriers,  and  attempts  to  compensate  them  for  such 
use.  The  problem  is  how  to  decide  which  carriers  to  tap, 
and  how  to  compensate  the  parties  affected.  Ideally,  the 
state  would  like  to  achieve  the  following: 

1.  Acquire  the  required  types  and  number  of  aircraft.. 

2.  Minimize  (in  some  cases,  eliminate)  its  own  costs. 

3.  Minimize  the  total  true  costs  to  the  carriers  tapped. 

4.  Distribute  the  cost,  fairly  among  all  the  carriers, 
those  tapped  and  not.. 
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While  our  treatment  of  the  problem  going  forward  will 
be  entirely  abstract,  and  indeed  make  some  assumptions 
that  are  not  consistent  with  this  application  (in  partic¬ 
ular,  that  the  military  wishes  to  pay  less  than  market 
value  for  t.hft-;services  rendered),  the  reader  might  keep 
this  example  in  mind  throughout  the  paper. 

The  rest  of  the  paper  is  organized  as  follows.  In  sec¬ 
tion  2  we  define  the  procurement  problem.  The  pro¬ 
curement  problem  is  isomorphic  to  the  problem  of  task 
allocation  (or  the  auctioning  of  a  good).  In  section  3 
we  discuss  the  fair  imposition  of  tasks.  In  particular,  we 
introduce  the  notion  of  k-efficiency,  which  is  central  for 
the  evaluation  of  procurement  protocols  in  the  context  of 
fair  allocation.  Sections  4-8  present  a  set  of  basic  results 
dealing  with  the  feasibility  of  fair  allocation  of  tasks.  We 
present  several  upper  bounds  (by  means  of  impossibility 
results)  on  the  level  of  fairness  and  the  minimization  of 
expenses  that  can  be  obtained,  as  well  as  protocols  for 
fair  task  allocation.  Further  discussion  of  the  contribu¬ 
tion  of  our  work  in  the  context  of  general  task  allocation 
and  protocols  for  non-cooperative  environments  can  be 
found  in  section  9. 

2  The  procurement  problem 

In  this  section  we  provide  the  reader  with  the  requisite 
knowledge  of  the  task  allocation/procurement  theory. 
Basic  procurement  theory  has  much  similarity  with  ba¬ 
sic  auction  theory.  Following  the  related  basic  procure¬ 
ment/auction  theory  literature,  we  restrict  ourselves  to  a 
1-shot,  single-item  procurements  problem.  We  later  (see 
Section  8)  weaken  the  second  restriction,  and  in  future 
work  deal  with  the  first  one. 

Consider  a  center  who  wishes  to  obtain  a  particular 
Service,  where  there  are  n  potential  suppliers,  or  agents, 
denoted  by  1,  2,  .  .  . ,  n  who  may  supply  this  service.  A 
procurement  protocol  is  a  procedure  in  which  partici¬ 
pants  submit  messages  (typically  monetary  bids)  for  pro¬ 
viding  the  service.  The  protocol’s  rules  specify  the  type 
of  messages,  and  as  a  function  of  the  messages  submitted 
by  the  participants  they  determine  the  service  provider 
and  the  payments  to  be  made  by  the  participants.  The 
payments  may  be  positive,  negative,  or  zero. 

Formally,  a  procurement  procedure  for  n  potential 
participants,  N  =  {l,2,...,n}  is  characterized  by  4 
parameters,  M,g,c,d,  where  M  is  the  set  of  messages, 
9  =  (t/i  i  •  •  *i  9n )  with  gi  :  Mn  ->■  [0,1]  for  all  i  and 
12i=i9i(m)  <  1  for  all  m,  and  c  =  (ci,  .  .  . ,  cn);  d.  = 
[d\,  .  .  . ,  dn)  with  Cj,  dj  :  Mn  — >■  R  for  all  i,  j .  The  inter¬ 
pretation  of  these  elements  is  as  follows.  Participant  i 
submits  a  message  m,-  E  M.  Let  m  =  (»m,  mo,  .  .  . ,  mn) 
lie  a  vector  of  messages,  then  the  organizer  conducts  a 
lottery  to  d^ermine  the  service  provider,  in  which  the 
probability  that  i  is  the  winner  equals  gi(m).  The  win¬ 
ner,  say  j,  is  paid  Cj(m)  and  every  other  participant  i 
pays  di(m).  The  classical  theory  of  procurement  asso¬ 
ciates  a  (Bayesian)  game  with  each  procurement  pro¬ 
cedure  and  analyses  the  behavior  of  the  agents  under 
the  equilibrium  assumption,  as  described  below.  Each 


agent,  has  a  type,  i y,  selected  from  a  set  of  possible  types 
V.  The  type  of  agent,  i,  i y,  is  known  to  it,  but.  might, 
not.  be  known  to  the  other  agents.  The  type  i y  should 
be  interpreted  as  the  cost,  for  agent,  i  in  providing  the 
required  service.  A  strategy  for  agent,  i  is  a.  function 
bi  :  R+  — >  M,  where  is  the  message  submitted  by 

i  when  his  type  is  i y .  Each  agent,  i  has  a.  utility  function 
uy.  Assuming  that  the  agents  submitted  the  tuple  of 
messages  m  =  (mi,  mo,  .  .  . ,  mn),  the  service  will  be  allo¬ 
cated  based  on  g.  The  utility  of  agent,  i  will  be  c8  (m)  —  iy 
if  it.  has  been  selected  as  the  service  provider,  and  oth¬ 
erwise  its  utility  will  be  —di(m).  A  tuple  of  strategies, 
b  =  (&i,  bo,  .  .  . ,  b, ,)  will  be  called  an  equilibrium,  if  for  ev¬ 
ery  agent,  i,  bi  is  the  best,  response  against,  the  strategies 
of  the  other  agents  in  b  (denoted  by  &_8).  A  strategy  6S-  of 
agent,  i  will  be  called  a.  dominant,  strategy  if  for  any  tuple 
of  strategies  of  the  other  agents,  and  for  any  strategy 
bj  of  agent,  i  we  have  that  w8-(6j,  6_s-)  >  tq(&[,  &_8). 

3  From  procurement  to  fair  imposition 

In  the  setup  discussed  in  this  paper,  the  center  can  force 
the  agents  to  provide  a.  desired  service,  as  well  as  mone¬ 
tary  payments.  However,  the  center  may  wish  to  get.  the 
desired  service  while  attempting  to  minimize  the  agents’ 
costs. 

Definition  1  Given  a  procurement  setting,  with  n  pos¬ 
sible  service  providers,  N  =  {1,2,  .  .  .,??},  and  costs  se¬ 
lected  from  the  set  V,  a  procurement  protocol  ( M,g ,  c ,  d), 
where  M  =  V  is  the  set  of  messages  (possibly  declared 
costs),  g  is  the  allocation  function,  c  determines  the 
service  provider  payments,  and  d  determines  the  other- 
agents  ’  payments^  is  called  incentive  compatible  if  for 
any  cost  iy  E  V  of  agent  i,  its  payoff  is  maximized  by 
sending  the  message  my  =  iy  regardless  of  the  messages 
of  the  other  agents  (i.e.,  truth  revealing  is  a  dominant 
strategy). 

In  the  sequel  we  will  restrict,  ourselves  to  incentive  com¬ 
patible  protocols.  Incentive  compatible  protocols  have 
several  desired  properties.  In  particular,  they  do  not. 
build  on  any  rationality  assumption  on  the  behavior  of 
other  agents.  This  implies  that  an  agent,  can  refrain  from 
adopting  and  computing  probabilistic  information  about, 
the  other  agents’  behavior  when  employing  a.  dominant, 
strategy. 

In  order  to  introduce  a.  basic  definition  of  fair  imposi¬ 
tion  of  protocols  we  use  the  following  idea..  We  wish  first, 
to  ensure  that  the  center  will  minimize  its  expenses  due 
to  the  performance  of  the  service.  Given  that,  we  would 
like  to  minimize  the  expenses  of  each  and  every  agent.. 
This  leads  to  the  following  central  definition: 

Definition  2  Given  a  procurement  setting  S  with  n 
agents,  a  procurement  protocol  P  is  called  fc-efficient.  if 
the  following  holds: 

1.  P  is  incentive  compatible. 

2.  For  any  tuple  of  costs  (tq,  vo,  •  •  • ,  vn)  where  iy  is  the 
cost  of  agent  i,  we  have  that: 
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(a)  The  sum  of  payments  from  the  center  to  the 
agents  is  non-positive. 

(h)  The  cost  to  agent  i  is  no  more  than  where 
is  the  k  order  statistics  of  {v i,  vn,  •  •  • , 

Notice  that  our  definition  of  fairness  captures  the  desire 
to  minimize  expenses  of  each  particular  agent.  Notice 
that  we  have  required  that  the  center’s  budget  be  non¬ 
negative.  The  case  where  the  procurer  expects  to  pay 
for  its  services  is  discussed  in  Section  7.  In  the  sequel  we 
will  denote  the  agent,  with  the  7-th  lowest  valuation  by 
a*  and  its  valuation  by  op] . 

4  1-efficiency 

Work  on  economic  mechanism  design  in  game  theory 
[Fudenberg  and  Tirole,  1991]  usually  adopts  several  ba¬ 
sic  requirements.  One  of  these  basic  requirements  is  that 
protocols  should  be  economically  efficient.  We  will  use 
the  following  standard  definition: 

Definition  3  Given  a  procurement  setting  S  with  n 
agents,  a  procurement  protocol  P  will  be  called  economi¬ 
cally  efficient  if  it  always  selects  the  agent  with  the  lowest 
cost  to  serve  as  the  service  provider. 

We  can  now  prove: 

Lemma  1  Given  a  procurement  setting  S  with  n  agents, 
a  procurement  protocol  P  is  1-efficient  only  if  it  is  eco¬ 
nomically  efficient. 

Proof  (sketch):  Assume  that  the  protocol  chooses  the 
agent,  with  the  second  lowest,  valuation  as  the  service 
provider.  Recall  we  denote  the  agent,  with  the  7-t.h  lowest, 
valuation  by  a8-  and  its  valuation  by  op] .  In  order  to  get. 
1-efficiency  agents  ai,  <3.3,  <3.4,  .  .  . ,  an  need  to  suffer  a.  cost, 
of  at  most.  4-  f’[i]  -  This  implies  that  if  an  is  the  service 
provider  t  then  it.  will  suffer  an  expense  of  at  least.  0[2]  — 
^TT^T1]  >  n  !'[i]’  which  contradicts  1-efficiency.  Similar 
argument,  holds  when  the  agent,  who  will  provide  the 
service  is  a.j ,  j  >  2 . 

The  above  lemma,  teaches  us  that  in  order  to  have  1- 
efficiency  we  must,  have  economic  efficiency.  However, 
we  now  show: 

Lemma  2  There  is  no  protocol  that  is  both  1-efficient 
and  economically  efficient. 

Proof  (sketch):  In  order  to  have  1-efficiency  agents 
<3.2,  <33,  .  .  . ,  an  should  suffer  an  expense  of  at  most.  -hi. 
Since  the  center  may  not.  have  a.  negative  balance,  and 
since  we  require  1-efficiency,  and  since  the  cost,  of  the  ser¬ 
vice  for  a  1  is  op]  we  get.  that  the  payments  are  as  follows: 
each  agent.  a8,  i  >  2  pays  exactly  ^-op]  t.o  the  center,  who 
collects  these  payments  and  transfers  that,  to  Gp.  How¬ 
ever,  such  incentive  compatible  efficient,  protocol,  where 
the  sum  of  payments  is  0,  does  not.  exist,  (see  [Mas-Colell 
et  al,  1995],  page  880).2  3 

2  The  k  order  statistic  of  a  set  is  the  k  lowest  element  in 
this  set. 

3  This  is  no  longer  the  case  if  we  consider  Bayesian  equi¬ 
librium  [d’Aspremont.  and  Gerard- Varet.,  1979]. 


Hence,  as  we  have  seen,  economic  efficiency  is  essential 
for  1-efficiency,  but.  1-efficiency  and  economic  efficiency 
are  contradicting  in  our  setup.  Hence,  we  get.: 

Theorem  1  There  is  no  1-efficient  procurement  proto¬ 
col. 

5  2-efficiency 

Given  the  impossibility  of  1-efficiency  the  next,  desirable 
alternative  is  to  consider  the  case  of  2-efficiency.  As  be¬ 
fore,  we  are  first,  interested  in  the  connections  between 
2-efficiency  and  economic  efficiency.  We  can  show: 

Lemma  3  Given  a  procurement  setting  S  with  n  agents, 
a  procurement  protocol  P  is  2-efficient  only  if  it  is  eco¬ 
nomically  efficient. 

Proof  (sketch):  Assume  that  the  protocol  assigns  the 
good  to  agent,  an.  Notice  that  in  order  to  obtain  2- 
efficiency  all  other  agents  will  have  t.o  pay  exactly 
This  implies  that  agent.  2  will  suffer  an  expense  of 
Hence,  an  may  go  ahead  and  report,  a.  cost,  v' ,  where 
t'[i]  <  v'  <  t)[2].  Consider  now  the  tuple  of  types 
(op],  o',  op],  .  .  . ,  op,]).  Regardless  of  who  is  the  agent,  t.o 
be  allocated  the  service  for  this  valuation  (whether  this 
is  agent,  an  or  another  agent),  2-efficiency  will  imply  that 
agent,  an  will  suffer  an  expense  of  no  more  than  of  ^v' . 
Hence,  the  above  deviation  is  rational,  which  contradicts 
incentive  compatibility.  It.  is  immediate  to  see  that  an 
allocation  of  the  service  t.o  agent,  a.j ,  j  >  3  can  not.  lead  to 
2-efficiency.  Hence,  the  allocation  needs  to  be  efficient.. 

Thus  again  we’d  like  to  know  whether  economic  effi¬ 
ciency  and  2-efficiency  are  compatible.  Unfortunately, 
at  this  time  we  don’t.,  so  instead  we  give  a.  weaker  result.. 
Consider  the  following  property  defined  on  procurement, 
protocols. 

Definition  4  Given  a  procurement  setting  S  with  n 
agents,  a  procurement  protocol  is  called  unbiased,  if  for 
every  tuple  of  agents'  types  the  payments  by  all  agents 
who  do  not  provide  the  service  are  identical,  and  the  fol¬ 
lowing  property  does  not.  hold: 

•  For  any  tuple  of  agents’  types,  the  service  provider’s 
expenses  are  greater  than  or  equal  to  the  expenses  of 
all  other  agents,  and  for  at  least  one  tuple  of  types 
its  expense  is  greater  than  the  expenses  of  all  other- 
agents. 

Thus  a.  biased  procurement,  protocol  either  favors  the 
agents  that  do  not.  provide  the  service,  or  differentiates 
among  them.  We  can  now  show: 

Theorem  2  Given  a  procurement  setting  S  with  n 
agents,  there  is  no  unbiased  2-efficient  procurement  pro¬ 
tocol. 

Proof  (sketch):  First.,  given  the  previous  lemma,  we 
can  restrict,  our  attention  t.o  protocols  that,  guarantee  ef¬ 
ficient.  allocation.  Notice  that  for  any  tuple  of  types, 
either  all  agents’  expenses  are  exactly  ^t'[i],  or  there 
should  be  at  least,  one  agent,  who  suffers  expenses  of  more 
than  i-vp]  (otherwise  the  center  will  suffer  losses).  Since 
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the  protocol  is  unbiased  there  must  be  at  least  one  tu¬ 
ple  of  types,  for  which  agent  an  pays  ^-op]  +  e,  for  some 
e  >  0.  Assume  that  for  this  tuple  of  types,  an  reports  the 
cost  ()[!]  +d',  where  S  >  0  and  II[1^+<?  <  ^-op]  +  e  (which  is 
satisfied  for  any  S  <  ne).  We  get  that,  since  the  alloca¬ 
tion  needs  to  be  economically  efficient  (and  therefore  this 
modification  will  not  change  the  allocation  of  service) 
and  since  the  protocol  is  2-efficient,  such  deviation  will 
decrease  the  expense  of  agent,  an :  consider  the  behavior 
of  the  protocol  on  the  tuple  (op],  C[i]+d,  0[3],  .  .  . ,  0[n]),  we 

get  that  agent,  an  will  pay  no  more  than  II[1^+<?  < 

This  contradicts  incentive  compatibility.  This  implies 
that,  there  is  no  unbiased  2-efficient,  protocol. 


6  3-efficiency 

The  negative  results  of  the  previous  sections  teach  us 
that,  in  order  to  obtain  fair  imposition  of  services  we 
need  to  consider  protocols  that,  are  at.  most.  3-efficient.. 
In  this  section  we  show  that,  this  upper  bound  can  be 
matched  by  an  appropriate  protocol.  Consider  the  fol¬ 
lowing  protocol. 

Fair  3: 

1.  Each  supplier  is  asked  to  reveal  its  costs  to  the  cen¬ 
ter. 

2.  The  task  will  be  allocated  to  the  agent,  who  has 
announced  the  lowest,  cost.;  this  agent,  will  be  paid 
the  second  lowest,  cost,  announced. 

3.  Each  supplier  will  pay  to  the  center  ^  of  the  second 
lowest,  reported  cost,  of  the  other  participants. 

Notice  that  this  protocol  (as  well  as  protocol  Fa.ir3b 
to  be  discussed  in  section  8)  makes  use  of  the  Groves 
scheme  for  mechanism  design  [Groves,  1973].  We  can 
now  show: 


Proposition  1  Given  a  procurement  setting  S  with  n 
agents,  Fair 3  is  a  3-efficient  protocol  for  that  setting. 


Proof  (sketch)  :  Notice  that  if  every  agent,  reports 
its  actual  costs,  then  the  payments  by  a.\  and  an  are 
-Ip ,  and  the  payment,  by  a,j ,  j  >  3  is  -p- .  We  need  to 
show  that,  truth  revealing  is  a.  dominant,  strategy.  How¬ 
ever,  since  the  payments  here  fit.  the  Groves  scheme  (see 
[Groves,  1973])  we  get  that,  the  protocol  is  also  incentive 
compatible. 

As  we  can  see,  if  we  are  willing  t.o  settle  for  3-efficiency, 
then  quite  fair  task  allocation  for  minimizing  the  agents’ 
actual  expenses  can  be  obtained.  The  center  will  be  able 
to  obtain  the  desired  behavior  without,  suffering  any  cost.. 


7  Almost  budget  balanced  protocols 

Given  the  general  results  of  the  previous  sections,  it.  may 
be  of  interest,  to  see  how  far  we  are  from  obtaining  2- 
efficiency.  In  order  to  address  this  issue  we  challenge 
one  of  the  assumptions  of  our  model,  namely  that,  in  no 
case  is  any  cost,  imposed  on  the  center.  Indeed,  while 
in  some  situations  the  center  should  be  expected  to  pay 
nothing,  in  many  others  they  expect,  to  bear  some  costs. 


In  some  cases  -  including,  arguably,  the  VTC  domain, 
which  motivated  this  work  -the  expectation  is  that,  the 
service  providers  experience  a.  positive  surplus.  Here  we 
however  explore  the  intermediate  situation  in  which  the 
center  is  willing  to  suffer  some  expense,  but.  a.  minimal 
one. 

In  order  to  handle  the  above  issue  let.  us  assume  that. 
V’  =  [a,  6]  where  h  >  a ,  i.e.  the  agents’  costs  are  in 
between  a  and  h.  In  many  domains,  assuming  the  costs 
are  high,  we  have  that,  b  —  a  «  a.  For  example,  the  costs 
to  various  airlines  of  providing  a.  given  flight,  might,  range 
from  $750K  to  $800K.  The  center  be  willing  to  suffer  a. 
payment,  of  $800K-$750K=$50K  but.  not.  of  $750K  or 
more. 

Given  the  above  intuition  we  consider  the  following 
definition: 

Definition  5  Given  a  procurement  setting  with  n 
agents,  a  procurement  protocol  P  will  be  called  an  al¬ 
most  budget  balanced  k-effcient  protocol,  if  the  following 
holds: 

1.  P  is  incentive  compatible. 

2.  For  any  tuple  of  costs  (tq,  vn,  .  .  . ,  vn)  where  ig  is  the 
cost  of  agent  i,  we  have  that: 

(a)  The  sum  of  payments  from  the  center  to  the 
agents  is  at  most  0[n]  —  op] . 

(b)  The  payment  by  agent  i  is  no  more  than 
-p,  where  op,]  is  the  k  order  statistics  of 
{oi,  o2,  .  .  . ,  on}. 

Consider  the  following  protocol: 

almost  2: 

1.  Each  supplier  is  asked  to  reveal  its  cost,  to  the  center. 

2.  The  task  will  be  allocated  to  the  agent,  who  has 
announced  the  lowest,  cost.;  this  agent,  will  be  paid 
the  second  lowest,  cost,  announced. 

3.  Each  supplier  will  pay  to  the  center  ^  of  the  lowest, 
reported  cost,  of  the  other  participants. 

We  can  now  show: 


Proposition  2  a.lmost.2  is  an  almost  budget  balanced  2- 
efficient  protocol  for  any  given  procurement  setting. 


Proof  (sketch):  Let.  there  be  n  agents  in  the  setting 
with  valuations  op],  ...,  0[n].  Since  the  payment,  scheme 
fits  the  Groves  payments  (see  [Groves,  1973])  we  get.  that 
the  protocol  is  incentive  compatible.  In  addition,  if  every 
agent,  reports  its  actual  valuation  then  the  payment,  by 
Gp  is  -p,  and  the  payment,  by  aj,j  >  2  is  -p.  The 
center  will  pay  ppO[2]  -  ppopj  <  «[„]  -  *'p]- 


8  Extension:  the  imposition  of 
interacting  services 

Our  study  in  the  previous  sections  has  concentrated  on 
the  imposition  of  a.  single  task  on  a.  set.  of  agents.  If 
there  are  several  tasks,  we  can  deal  with  each  of  those 
tasks  separately,  and  apply  the  techniques  and  results 
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we  previously  obtained.  Although  this  may  be  quite  ap¬ 
propriate  in  many  cases,  one  may  wish  to  consider  more 
general  extensions.  In  this  section,  we  briefly  consider 
one  extension.  We  will  concentrate  on  the  case  of  two 
i nte racti ng  serin ce . 

Consider  two  services,  1  and  2.  The  cost  of  performing 
service  j  by  agent,  i  will  be  denoted  by  i’i(j),  and  the  cost 
of  performing  both  services  by  agent,  i  will  be  denoted 
by  Vj({1,  2}).  The  fact,  that  there  exists  some  interaction 
between  the  services  will  be  captured  by  the  fact,  that  it. 
might,  be  that  tq- ( { 1 , 2} )  f-  ty(l)  +  i’i{ 2).  For  example, 
a.  carrier’s  cost,  for  a.  pair  of  flights  might,  be  lower  than 
the  sum  of  costs  for  each  of  the  flights  since  these  flights 
might,  refer  to  two  consecutive  periods  or  two  consecutive 
routes  (notice  the  similarity  with  combinatorial  auctions, 
a.  popular  topic  in  the  recent.  AI  literature  [Fujishima  et 
al.,  1999;  Sandholm,  1999;  Tennenholtz,  2000];  in  both 
cases  the  valuation  for  a.  pair  of  goods  might,  be  different, 
from  the  sum  of  valuations  of  these  goods). 

The  procurement,  setting  definition  will  be  now  revised 
t.o  have  two  services,  and  n  agents  with  cost.s/t.ypes  as 
above,  selected  from  a.  set.  V .  For  ease  of  exposition  let. 
V  =  [0,6]  for  some  6  >  0 . 

Definition  6  Given  a  procurement  setting  with  two 
services,  and  with  n  possible  service  providers,  N  = 
{1,  2,  .  .  . ,  7? } ,  where  costs  for  single  services  and  for  the 
pair  of  services  are  selected  from  the  set  V,  a  pro¬ 
curement  protocol  (M,  g,  c,  d)  is  a  tuple  where  M  = 
V’3  is  the  set  of  messages,  and  a  message  m  = 
(mi,  mo,  m3)  declares  the  costs  for  service  1,2,  and  the 
pair  {1,2}  respectively,  where  g  is  the  allocation  func¬ 
tion,  p  determines  the  payments  to  the  service  providers, 
and  d  determines  the  other  agents  ’  payments.  Such 
a  protocol  is  called  incentive  compatible  if  for  every 
valuation  ty(l),  ty(2),  tq- ( { 1 , 2})  £  V  of  agent  i,  its 
payoff  is  maximized  by  sending  the  message  np  = 
(vj(l),  Vi(2),  Vj({1,  2}))  regardless  of  the  messages  of  the 
other  agents. 

The  definition  of  k-efflciency  can  be  extended  in  vari¬ 
ous  ways  in  order  to  handle  the  case  of  two  interacting 
services.  We  now  describe  one  of  these  possible  exten¬ 
sions. 

Definition  7  Given  a  procurement  setting  S  with  two 
interacting  services,  and  n  agents,  a  procurement  proto¬ 
col  P  will  be  called  k-efficient,  if  the  following  holds: 

1.  P  is  incentive  compatible. 

2.  For  any  tuple  of  costs  of  the  agents  we  have: 

(a)  The  sum  of  payments  from  the  center  to  the 
agents  is  non-positive. 

(b)  The  payment  by  agent  i  is  no  more  than 
t'[t](i)+ti[t](-)  ^  w^ere  t,| -kfj)  is  the  k  order  statis¬ 
tics  of  {v1(j),  v2(j),  .  ..,vn(j)} 

It.  is  easy  to  extend  our  infea.sibilit.y  results  for  1- 
efficiency  and  for  2-efficiency  to  the  case  of  two  inter¬ 
acting  services.  As  we  will  now  show,  the  positive  result, 
on  3-efficiency  can  be  generalized  as  well. 


Consider  the  following  protocol: 

Fair  3b: 

1.  Each  supplier  is  asked  to  reveal  its  costs  to  the  cen¬ 
ter. 

2.  The  tasks  will  be  allocated  to  the  agents  who  have 
announced  the  lowest,  cost.;  notice  that  both  tasks 
can  be  allocated  to  the  same  agent.,  or  the  tasks  can 
be  allocated  to  different,  agents. 

3.  If  a.  supplier  s  has  been  selected  to  supply  both  ser¬ 
vices  then  he  will  be  paid  as  follows:  an  allocation 
of  the  lowest,  cost,  possible,  ignoring  this  supplier’s 
messages  will  be  calculated,  and  the  supplier  s  will 
be  paid  according  to  the  cost,  associated  with  this 
allocation. 

4.  If  a.  supplier  s  has  been  selected  to  supply  the  ser¬ 
vice  x ,  and  another  supplier  s'  has  been  selected  to 
supply  the  service  y,  then  s  will  be  paid  as  in  (3), 
minus  the  cost,  associated  with  supplying  y  by  s' . 

5.  For  each  agent,  i  we  consider  the  second  lowest,  de¬ 
clared  cost.,  Ci(j)  of  the  other  agents  for  service  j. 
Agent,  i  will  be  asked  t.o  pay  t.o  the  center  cd1)+c'(-) . 

We  can  now  show: 

Proposition  3  Given  a  procurement  setting  S  with  two 
interacting  services  and  n  agents,  Fair 3b  is  a  3-efficient 
protocol  for  that  setting. 

9  Discussion 

Social  systems  face  the  challenge  of  distributing  efforts 
among  service  providers,  in  a.  way  that  will  obtain  the 
society  goals,  while  attempting  to  minimize  costs  for  the 
individuals  in  that  society.  Therefore,  the  problem  of 
fair  imposition  of  tasks  appears  in  a.  variety  of  domains, 
and  is  fundamental  t.o  obtaining  efficient,  and  fair  pro¬ 
curement.  procedures.  In  this  paper  we  have  introduced 
a.  general  rigorous  setting,  where  the  fair  imposition  of 
tasks  can  be  studied.  Using  this  setting,  we  have  pro¬ 
vided  general  results  on  the  fair  imposition  of  services  in 
multi-agent  systems. 

Our  study  deals  with  the  problem  of  task  alloca¬ 
tion  with  self-mot. iva.t.ed  service  providers,  where  the 
center  can  enforce  agents’  actions  (e.g.  their  pay¬ 
ments).  The  complexity  of  this  setting  stems  from  the 
fact,  that  the  participants  can  try  and  cheat,  the  cen¬ 
ter  about,  their  private  information  (e.g.  their  costs) 
while  the  center  wishes  to  keep  the  allocation  fair  and 
to  minimize  the  expenses  of  each  individual  partici¬ 
pant..  As  a.  result.,  our  study  complements  previous 
work  on  social  laws  (e.g.  [Moses  and  Tennenholtz,  1995; 
Shoha.m  and  Tennenholtz,  1995])  and  on  the  imposition 
of  protocols  on  multi-agent,  systems  (e.g.  [Minsky,  1991a.; 
1991b]),  as  well  as  work  in  information  economics  (see 
[Kreps,  1990]  for  a.  general  discussion).  Our  work  also 
complements  work  in  Distributed  AI  dealing  with  rules 
on  interactions  for  self-mot.iva.t.ed  agents  (e.g.  [Rosen- 
schein  and  Zlot.kin,  1994;  Kraus,  1997]),  as  well  as 
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work  bridging  the  gap  between  Game  Theory  and  Com¬ 
puter  Science  [Boutilier  et  al.,  1997;  Tennenholtz,  1999; 
Varian,  1995].  Perhaps  the  most  relevant  line  of  research 
is  work  on  optimal  auction  design  (which  is  isomor¬ 
phic  to  work  on  optimal  design  of  procurement  proto¬ 
cols;  see  [Wolfstetter,  1996;  McAfee  and  McMillan,  1987; 
Milgrom,  1987]  for  overviews).  However,  our  setting, 
where  fairness  is  the  major  objective  (rather  than  eco¬ 
nomic  efficiency  or  center’s  revenue)  and  behaviors  can 
be  enforced  (hence,  participation  constraints  do  not  ap¬ 
ply),  distinguishes  our  line  of  research  from  the  economic 
mechanism  design  literature. 

We  see  the  procurement  setting  as  a  basic  building 
block  for  general  task  allocation  in  multi-agent  systems. 
Hence,  in  order  to  address  general  task  allocation  in  non- 
cooperative  environments  we  need  to  obtain  deep  un¬ 
derstanding  of  the  basic  procurement  setup.  This  seems 
essential  for  the  design  of  protocols  for  non-cooperative 
environments.  Needless  to  say,  when  facing  this  funda¬ 
mental  need,  basic  work  in  AI  dealing  with  protocols  for 
non-cooperative  environments  and  work  on  mechanism 
design  in  game  theory  share  much  in  common.  The  no¬ 
tion  of  fair  imposition  and  its  study  are  the  contributions 
of  this  paper  to  these  lines  of  research. 

Much  left  to  be  done.  In  particular,  the  study  of  the 
multi-item  (i.e.  interacting  services)  setting  should  be 
further  developed.  In  addition,  an  explicit  treatment  of 
repeated  procurement  situations  is  a  challenge  of  consid¬ 
erable  importance.  More  specific  challenge  is  to  try  and 
come  with  a  biased  2-efficient  protocol.  We  plan  to  pur¬ 
sue  these  extensions  in  our  future  work.  We  believe  that 
the  study  of  the  fair  imposition  of  tasks  is  a  challenge  of 
fundamental  importance  to  the  design  of  effective  multi- 
agent  systems,  and  hope  that  others  will  join  us  in  this 
effort . 
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Abstract 

We  introduce  the  notion  of  fault  tolerant 
mechanism  design ,  which  extends  the  stan¬ 
dard  game  theoretic  framework  of  mechanism 
design  to  allow  for  uncertainty  about  execu¬ 
tion.  Specifically,  we  define  the  problem  of 
task  allocation  in  which  the  private  informa¬ 
tion  of  the  agents  is  not  only  their  costs  to 
attempt  the  tasks,  but  also  their  probabili¬ 
ties  of  failure.  For  several  different  instances 
of  this  setting  we  present  technical  results, 
including  positive  ones  in  the  form  of  mecha¬ 
nisms  that  are  incentive  compatible,  individ¬ 
ually  rational  and  efficient,  and  negative  ones 
in  the  form  of  impossibility  theorems. 

1  INTRODUCTION 

Recent  years  have  seen  much  activity  at  the  interface 
of  computer  science  and  game  theory,  in  particular  in 
the  area  of  Mechanism  Design,  or  MD  (e.g.  (Parkes 
&  Ungar  2000;  Boutilier,  Shoham,  &  Wellman  1997; 
Shoham  &  Tennenholtz  2001;  Nisan  &  Ronen  2001)). 
A  sub-area  of  game  theory,  MD  is  the  science  of  craft¬ 
ing  protocols  for  self-interested  agents,  and  as  such  is 
natural  fodder  for  computer  science  in  general  and  AI 
in  particular.  The  uniqueness  of  the  MD  perspective  is 
that  it  concentrates  on  protocols  for  non-cooperative 
agents.  Indeed,  traditional  game  theoretic  work  on 
MD  focuses  uniquely  on  the  incentive  aspects  of  the 
protocols. 

A  promising  application  of  MD  to  AI  is  the  problem 
of  task  allocation  among  self-interested  agents  (see  e.g. 
(Rosenschein  &  Zlotkin  1994)).  When  only  the  execu¬ 
tion  costs  are  taken  into  account,  the  task  allocation 
problem  allows  standard  mechanism  design  solutions. 

1This  work  was  supported  in  part  by  DARPA  grant 
F30602-00- 2-0598. 


However,  this  setting  does  not  take  into  consideration 
the  possibility  that  agents  might  fail  to  complete  their 
assigned  tasks.  When  this  possibility  is  added  to  the 
framework,  existing  results  cease  to  apply.  The  goal 
of  this  paper  is  to  investigate  robustness  to  failures  in 
the  game  theoretic  framework  in  which  each  agent  is 
rational  and  self- motivated.  Specifically,  we  consider 
the  design  of  protocols  for  agents  which  have  not  only 
private  cost  functions,  but  also  privately-known  prob¬ 
abilities  of  failure. 

What  criteria  should  such  protocols  meet?  Traditional 
MD  has  a  standard  set  of  criteria  for  successful  out¬ 
comes,  namely  social  efficiency  (maximizing  the  sum 
of  the  agents’  utilities),  individual  rationality  (posi¬ 
tive  utility  for  all  participants),  and  incentive  com¬ 
patibility  (incentives  for  agents  to  reveal  their  pri¬ 
vate  information).  Fault  Tolerant  Mechanism  Design 
(FTMD)  strives  to  satisfy  these  same  goals;  the  key 
difference  is  that  the  agents  have  richer  private  infor¬ 
mation  (namely  probability  of  failure,  in  addition  to 
cost).  As  we  will  see,  this  extension  presents  novel 
challenges. 

It  is  important  to  distinguish  between  different  pos¬ 
sible  types  of  failure.  The  focus  of  this  paper  is  on 
failures  that  occur  when  agents  make  a  full  effort  to 
complete  their  assigned  tasks,  but  may  fail.  A  more 
nefarious  situation  would  be  one  in  which  agents  may 
also  fail  deliberately  when  it  is  rational  to  do  so.  While 
we  do  not  formally  consider  this  possibility,  we  will  re¬ 
visit  it  at  the  end  of  the  paper  to  explain  why  our 
results  hold  in  this  case  as  well.  Finally,  one  can  con¬ 
sider  the  case  in  which  there  exist  irrational  agents 
whose  actions  (for  example,  intentional  failures)  are 
counter  to  their  best  interests.  This  is  the  most  diffi¬ 
cult  type  of  failure  to  handle,  because  the  presence  of 
such  agents  can  affect  the  strategy  of  rational  agents, 
in  addition  to  directly  affecting  the  outcome.  We  leave 
this  case  to  future  work. 

It  is  helpful  to  consider  a  concrete  example.  Consider 
a  network  of  links  which  are  owned  by  selfish  agents 
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(e.g.  airline  companies),  and  two  distinguished  nodes 
S  and  T  in  it.  We  allow  multiple  links  between  nodes 
so  that  more  than  one  agent  can  provide  the  same  ser¬ 
vice  (but  only  agent  can  be  selected  to  do  so).  When 
an  object  is  routed  through  a  link,  the  owning  agent 
incurs  some  cost.  In  addition,  the  agent  may  fail  (ac¬ 
cording  to  some  probability)  to  pass  the  object  across 
the  link  (e.g.,  the  object  is  lost  in  transit,  or  not  deliv¬ 
ered  by  a  strict  deadline).  The  costs  and  probabilities 
are  privately  known  to  their  owners.  Our  goal  is  to 
design  a  mechanism  (protocol)  that  will  ensure  that 
objects  will  be  sent  from  S  to  T  across  the  network  in 
the  most  reliable  and  cost-effective  way  possible. 

To  demonstrate  the  challenges  encountered  when  fac¬ 
ing  such  problems,  consider  even  the  simple  case  in 
which  the  network  consists  of  only  parallel  links  be¬ 
tween  S  and  T,  and  costs  are  all  zero.  A  naive  proto¬ 
col  would  ask  each  agent  for  their  probability,  choose 
the  most  reliable  agent  (according  to  the  declarations) 
and  pay  her  a  fixed,  positive  amount  if  she  succeeds, 
and  zero  otherwise.  Of  course,  in  this  case  each  agent 
will  report  a  probability  of  one  in  order  to  selfishly 
maximize  her  own  expected  profit. 

In  this  paper  we  study  progressively  more  complex 
task-allocation  problems.  The  first  problem  that  we 
study  is  one  in  which  there  is  only  one  task.  We  use 
this  setting  both  to  show  why  standard  MD  solutions 
are  not  applicable  and  to  present  our  basic  technique 
in  the  form  of  a  novel  mechanism.  After  extending  this 
technique  to  handle  the  case  of  multiple  tasks  without 
dependencies  among  them,  we  move  to  the  general  case 
of  dependent  tasks.  Here,  we  prove  an  impossibility  re¬ 
sult  when  we  demand  incentive  compatibility  in  dom¬ 
inant  strategies,  and  present  a  mechanism  that  solves 
in  equilibrium  the  case  of  dependent  tasks.  Finally, 
we  discuss  the  use  of  cost  verification  to  significantly 
improve  the  revenue  properties  of  the  center. 

2  RELATED  WORK 

The  work  presented  in  this  paper  integrates  techniques 
of  economic  mechanism  design  (an  introduction  to  MD 
can  be  found  in  (Mas-Collel,  Whinston,  &  Green  1995, 
chapter  23))  with  studies  of  fault  tolerant  problem 
solving  in  computer  science  and  AI. 

In  particular,  the  technique  used  in  our  mechanism 
is  similar  to  that  of  the  Generalized  Vickrey  Auction 
(GVA)  (Vickrey  1961;  Clarke  1971;  Groves  1973)  in 
that  it  aligns  the  utility  of  the  agents  with  the  overall 
welfare.  This  similarity  is  almost  unavoidable,  as  this 
alignment  is  the  only  known  general  principle  for  solv¬ 
ing  mechanism  design  problems.  However,  because  we 
allow  for  the  possibility  of  failures,  we  will  need  to 
change  the  GVA  in  a  significant  way  in  order  for  our 


mechanism  to  achieve  this  alignment. 

Because  we  have  added  probabilities  to  our  setting,  our 
mechanisms  may  seem  to  be  related  to  the  Expected 
Externality  Mechanism  (or  d’AGVA)  (d’Aspremont  & 
Gerard- Varet  1979),  but  there  are  key  differences.  In 
the  setting  of  d’AGVA,  the  types  of  the  agents  are 
drawn  (independently)  from  a  distribution  which  is 
assumed  to  be  common  knowledge  among  the  par¬ 
ticipants.  The  two  key  differences  in  our  setting  are 
that  no  such  common  knowledge  assumption  is  made 
and  that  the  solution  concepts  which  we  guarantee  are 
stronger  than  that  of  d’AGVA. 

A  recent  paper  (Eliaz  2002)  also  considers  failures  in 
MD,  but  solves  a  different  problem.  This  work  assumes 
that  agents  know  the  types  of  all  other  rational  agents 
and  also  limits  the  failures  that  can  occur  by  bounding 
the  number  of  irrational  agents. 

Finally,  the  design  of  protocols  which  are  robust  to 
failures  has  a  long  tradition  in  computer  science  (for  a 
survey,  see  (Linial  1994)).  Work  in  this  area,  however, 
almost  always  assumes  a  set  of  agents  that  are  by  and 
large  cooperative  and  adhere  to  a  central  protocol,  ex¬ 
cept  for  some  subset  of  malicious  agents  who  may  do 
anything  to  disrupt  the  protocol.  In  MD  settings,  the 
participants  fit  neither  of  these  classes,  but  are  simply 
self-interested. 

3  A  BASIC  MODEL 

In  this  section  we  describe  our  basic  model  and  no¬ 
tation,  which  will  be  modified  later  to  handle  specific 
settings. 

In  a  FTMD  problem,  we  have  a  set  of  t  tasks  r  = 
{1, . . . ,  t}  and  a  set  N  =  {1, . . . ,  n}  of  self-interested 
agents  to  which  the  tasks  can  be  assigned.  We  also 
have  a  center  M  who  assigns  tasks  to  agents  and  pays 
them  for  their  work.  The  center  and  the  agents  will 
collectively  be  called  the  participants. 

Each  agent  i  has,  for  each  task  j,  a  probability  ptj  € 
[0, 1]  of  successfully  completing  task  j,  and  a  nonnega¬ 
tive  cost  Cjj  G  3?+  of  attempting  the  task.  We  assume 
that  the  cost  of  attempting  a  task  does  not  depend  on 
the  success  of  the  attempt.  We  use  pi  =  {pn, . . .  ,pu) 
for  the  set  of  all  probabilities  for  agent  i,  and  use 
p  =  (p\ , . . .  ,pn)  to  represent  the  set  of  probability  vec¬ 
tors  for  all  agents.  We  use  corresponding  notation  for 
Ci  and  c.  The  pair  9i  =  ( pt ,  cf)  is  called  the  agent’s  type 
and  is  privately  known  to  the  agent.  Each  agent  is  as¬ 
signed  a  set  At  of  tasks,  and  her  cost  to  attempt  the 
set  is:  Cj(A)  =  J2jeAt  cij ■  We  define  9  =  (61,  ...,9n) 
as  the  vector  of  types  for  all  agents. 

We  use  a  completion  vector  p  €  {0,  l}4  to  denote  which 
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tasks  have  been  completed.  The  function  V  :  {0,  l}4  — ■> 
3?+  defines  the  center’s  nonnegative  valuation  for  each 
possible  completion  vector.  For  now,  we  assume  that 
the  center  has  a  non-combinatorial  valuation  for  a  set 
of  tasks.  That  is,  the  value  of  a  set  of  tasks  is  the  sum 
of  the  values  for  the  individual  tasks.  We  also  assume 
that  V (/n)  >  0  for  all  p  and  that  V(0, . . . ,  0)  =0. 

An  assignment  vector  A  =  (Ai,...,A„)  and  a  vec¬ 
tor  of  agent  probabilities  p  together  induce  a  proba¬ 
bility  distribution  over  the  completion  vector  which 
we  denote  by  [p\A,  p\.  Given  an  assignment  A,  a 
type  vector  9  and  a  completion  vector  p,  we  de¬ 
fine  the  welfare  of  the  participants  as  W (A,  c,  p)  = 
V(p)  —  J2iCi(Ai).  We  define  the  expected  welfare  as 
W (A,  c,  p)  =  [W (A,  c,  /i)] .  The  goal  of  the  cen¬ 

ter  is  to  design  a  mechanism  (protocol)  that  maximizes 
this  expected  welfare. 

We  assume  that  each  task  can  be  assigned  only  once. 
The  center  does  not  have  to  allocate  all  the  tasks.  For 
notational  convenience  we  assume  that  all  the  non- 
allocated  tasks  are  assigned  to  a  dummy  agent  0  which 
for  each  task  has  zero  probability  of  success  and  zero 
cost  to  attempt. 

When  an  agent  i  is  assigned  a  set  Aj  of  tasks,  and 
is  paid  i?i,  her  utility  equals  tq  =  Ri  —  Cj(Aj).  Since 
our  setting  is  stochastic  by  nature,  an  agent  can  do 
no  better  than  to  maximize  her  expected  utility,  Ui, 
calculated  before  any  task  is  attempted.  This  term 
thus  depends  on  the  true  probabilities  of  success  of 
the  agents,  as  explained  below. 

Throughout  the  paper  we  shall  use  the  following  vector 
notations:  The  subscript  —  i  on  a  vector  denotes  that 
the  term  for  agent  i  has  been  omitted  from  the  vector. 
For  example,  =  (pi, . . .  ,Pi-i,Pi+i,  ■  ■  ■  ,pn)-  The 
omitted  term  can  be  combined  with  such  a  vector  by 
using  the  following  notation:  p  =  (pj,p_*).  We  de¬ 
note  by  pi  the  completion  vector  for  agent  i  (i.e.  we 
have  1  for  each  task  accomplished  by  agent  i  and  0  for 
each  one  either  failed  by  her  or  not  assigned  to  her). 
The  definitions  for  p_i  and  (pi,  p-i)  follow  similarly. 
Sometimes  we  will  use  pi  in  place  of  pt.  Since  both 
vectors  are  of  the  same  form,  a  0  or  1  for  task  tj  in  pi 
becomes  the  probability  of  successfully  completing  tj. 

3.1  Mechanisms 

A  mechanism  is  a  protocol  that  decides  how  to  assign 
the  tasks  to  the  agents  and  how  much  each  agent  is 
paid.  The  simplest  type  of  mechanisms  are  ones  in 
which  the  agents  are  simply  required  to  report  their 
types.  (Of  course  they  may  lie!)  The  revelation  prin¬ 
ciple  (see  e.g.  (Mas-Collel,  Whinston,  &  Green  1995, 
p.  871))  tells  us  that  we  can,  w.l.o.g.,  restrict  ourselves 
to  such  mechanisms. 


We  denote  by  the  vector  9  the  types  declared  by  the 
agents  .  A  mechanism  is  thus  defined  by  a  pair  g  = 
(A(9),R(9,p))  such  that: 

•  A(9)  =  (Ai(0), . . .  ,An{9))  is  an  assignment  func¬ 
tion.  It  takes  a  declaration  vector  and  returns  an 
assignment  of  the  tasks  to  the  agents. 

•  R(9)  =  {R\{9,  p), . . . ,  Rn{9,  p))  is  the  payment 
function. 

In  our  motivating  example,  a  type  9i  would  correspond 
to  agent  i’s  costs  and  probabilities  of  success  on  each 
of  her  edges. 

In  our  protocol,  the  center  first  asks  each  agent  to  de¬ 
clare  her  type.  We  call  an  agent  truthful  if  she  reveals 
her  true  type  to  the  center.  Based  on  these  declara¬ 
tions  the  center  first  computes  the  assignment  A(ff). 
Then,  the  agents  execute  their  tasks.  Finally,  the  cen¬ 
ter  pays  the  agents.  Note  that  these  payments  de¬ 
pend  on  the  set  of  tasks  which  were  accomplished.  We 
assume  that  the  agents  always  attempt  each  task  to 
which  they  are  assigned.  In  our  discussion  section,  we 
explain  why  this  is  a  valid  assumption. 

In  the  above  protocol,  the  utility  of  agent 
i  is.  Uj{Cj,  0j,  9—i,  p)  —  Ri[0,  p)  C,;(A^()?)), 

and  her  expected  utility  is:  Ui(ci,9i,9^i,p)  = 

E[p\ A(0),p]  M)]- 

The  main  difference  between  mechanism  design  prob¬ 
lems  and  the  usual  algorithmic  problems  is  that  the 
participating  agents  may  manipulate  the  given  proto¬ 
col  if  it  is  beneficial  for  them  to  do  so.  We  therefore 
need  to  design  protocols  that  fulfill  our  objectives  even 
though  the  agents  behave  selfishly.  We  thus  require 
our  mechanism  to  satisfy  the  following  standard  prop¬ 
erties: 

Individual  rationality  (IR)  holds  when  truthful  agents 
are  guaranteed  to  have  non-negative  expected  utility. 
Formally,  for  all  i,  9  and  9-p.  Ui(ci,9i,9-i,p)  >  0. 

Incentive  compatibility  (IC)  holds  when  it  is  a  domi¬ 
nant  strategy  for  each  agent  to  declare  her  type  truth¬ 
fully.  Formally,  this  condition  holds,  when  for  all  i,  9, 
9\,  and  9-p.  ui(ci,9.l,9-i,p )  >  tZj(cj,  9[,  9-i,p).  This 
means  that  the  expected  utility  of  the  agent  (condi¬ 
tional  on  her  own  probability  of  success)  is  maximized 
when  the  agent  reports  her  true  type. 

A  mechanism  is  called  socially  efficient  (SE)  if  the  cho¬ 
sen  assignment  maximizes  the  expected  welfare  W . 
The  fact  that  W  depends  on  the  true  types  of  the 
agents  underscores  the  importance  of  IC,  which  allows 
the  center  to  correctly  assume  that  9  =  9. 
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Individual  rationality  for  the  center  (CR)  holds  if  the 
center’s  utility  um  =  V(/x)  — ^i(-)  is  always  nonneg¬ 
ative.  CR  is  an  extension  of  the  standard  mechanism 
design  requirement  of  weak  budget  balance  to  account 
for  the  center’s  utility  for  outcomes. 

A  final  goal  is  no  free  riders  (NFR),  which  holds  if  all 
agents  not  assigned  any  task  have  a  revenue  of  zero. 

4  SINGLE  TASK  SETTING 

We  will  start  with  the  special  case  of  a  single  task 
to  show  our  basic  technique  to  handle  the  possibility 
of  failures  in  MD.  For  expositional  purposes,  we  will 
analyze  two  restricted  settings  (the  first  restricts  prob¬ 
abilities  of  success  to  be  one,  and  the  second  restricts 
costs  to  be  zero),  before  formally  proving  properties 
about  our  mechanism  in  the  full  single  task  setting. 

Because  there  is  only  one  task,  we  can  simplify  the 
notation.  We  let  Cj  and  pt  denote  Cn  and  pt\ ,  re¬ 
spectively.  Similarly,  we  let  V  =  V((l)),  which  is  the 
value  that  the  center  assigns  to  the  completion  of  the 
task.  For  each  mechanism,  we  will  use  the  index  [1] 
to  denote  the  agent  selected  to  attempt  the  task  (e.g., 
P[!j  denotes  the  selected  agent’s  probability  of  success). 
The  subscript  [2]  then  refers  to  the  agent  who  would 
be  selected  as  the  service  provider  if  agent  [1]  had  not 
participated. 

4.1  CASE  1:  ONLY  COSTS 

When  we  do  not  allow  for  failures  (that  is,  Vi  Pi  =  1), 
the  goal  of  social  efficiency  reduces  to  assigning  the 
task  the  lowest-cost  agent.  This  simplified  problem 
can  be  solved  using  a  second-price  auction  (which  is  a 
specific  case  of  GVA).  Each  agent  declares  a  cost,  the 
task  is  assigned  to  the  agent  with  the  lowest  cost,  and 
that  agent  is  paid  the  second- lowest  submitted  cost. 

4.2  CASE  2:  ONLY  FAILURES 

We  now  reduce  the  problem  in  a  different  way,  by  as¬ 
suming  all  costs  to  be  zero  (Vi  Cj  =  0).  In  this  case,  the 
main  goal  is  to  allocate  the  task  to  the  most  reliable 
agent.  Interestingly,  we  cannot  use  a  straightforward 
application  of  the  GVA  for  this  case.  Such  a  mech¬ 
anism  would  ask  each  agent  to  declare  a  probability 
of  success  and  assign  the  task  to  the  agent  with  the 
highest  declared  probability.  It  would  set  the  revenue 
function  for  all  agents  not  assigned  the  task  to  be  0, 
while  the  service  provider  would  be  paid  the  amount 
by  which  her  presence  increases  the  welfare  of  the  other 
agents  and  the  center:  pmV  —  P[2\V.  Obviously,  such 
a  mechanism  is  not  incentive  compatible,  because  the 
payment  to  the  service  provider  depends  on  her  own 


declared  type!  Since  there  are  no  costs,  each  agent 
would  have  a  dominant  strategy  to  declare  her  proba¬ 
bility  of  success  as  one.  2 

Thus,  we  need  to  fundamentally  alter  our  payment 
rule  so  that  it  depends  on  the  outcome  of  the  attempt, 
and  not  solely  on  the  declared  types,  as  it  does  in  the 
GVA.  The  key  difference  in  our  setting  that  forces  this 
change  is  the  fact  that  the  true  type  of  an  agent  now 
directly  affects  the  outcome,  whereas  in  the  standard 
MD  setting  the  type  of  an  agent  only  affects  her  pref¬ 
erences  over  outcomes.  We  accomplish  our  goals  by 
replacing  p m  with  an  indicator  function  that  is  1  if 
the  task  was  completed,  and  0  otherwise.  The  pay¬ 
ment  rule  for  the  service  provider  is  now  V  —  P[2\V  if 
she  succeeds  and  —  P[2\V  if  she  fails.  Just  as  in  the 
previous  setting,  the  service  provider  is  the  only  agent 
who  has  positive  utility  for  attempting  the  task  with 
the  corresponding  payment  rule.  The  expected  utility 
for  agent  i  would  be  V  ■  {pi  ■  (1  -  p[2\)  -  (1  -  Pi )  •  P[2])- 
This  expression  is  positive  for  the  agent  iff  Pi  >  pj2], 
which  is  only  true  for  the  service  provider. 

4.3  CASE  3:  COSTS  AND  FAILURES 

We  now  consider  the  case  of  one  task  with  both  costs 
and  failures. 

We  introduce  the  following  definition  that  we  will  use 
throughout  the  paper:  Given  an  agent  i  we  denote  by 
ITU(c_j,p_,)  the  optimal  (declared)  expected  welfare 
when  tasks  cannot  be  allocated  to  agent  i.  In  the  single 
task  case  it  is  maXj^i(pj  -V  —  dj).  Now  we  can  define 
the  mechanism. 

Single  Task  Mechanism: 

Assignment  The  mechanism  chooses  an  agent  i  £ 
{0, . . . ,  n}  that  maximizes  the  (declared)  expected 
welfare  W  =  pi  ■  V  —  c*. 

Payment  The  payment  to  all  agents  not  assigned  a 
task  is  always  zero.  The  payment  to  the  “winner” 
i  is  defined  as  follows: 

„  _  (  V  —  Wl^C-iiP-i)  If  i  succeeds 
1  ~  \  -Wliic-up-i)  If  i  fails 

Using  2]  and  cp]  to  denote  the  probability  and  the 
cost  of  the  “second  best”  agent,  the  payment  to  agent 
i  when  she  succeeds  is  {V  —  p[2]  •  V  +  C[2j)  and  when  she 
fails  is  (— P[2]  •  V+cp]).  Note  that  W*  is  never  negative 

2 In  fact,  this  would  be  true  for  any  payment  rule  for 
which  an  agent’s  payment  is  always  nonnegative,  which  is 
the  reason  why  we  require  our  goals  (such  as  IC  and  IR)  to 
be  satisfied  for  the  expected  utility  of  the  agent,  and  not 
for  ex  post  utility. 
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because  the  center  will  never  make  an  assignment  that 
yields  an  expected  loss  for  the  system. 

For  example,  suppose  we  have  three  agents  with  the 
types  listed  in  Table  1.  Let  V  be  210.  If  the  agents  are 
truthful,  then  the  winner  is  agent  3.  If  agent  3  did  not 
exist,  the  optimal  expected  welfare  would  be  W* 3  = 
210  —  100  =  110,  because  the  task  would  be  assigned 
to  agent  2.  The  payment  for  agent  3  is  therefore  210  — 
110  =  100  if  she  succeeds  and  —110  if  she  fails.  Agent 
3’s  own  costs  are  60,  and  thus  her  expected  utility  is 
(100  -  60)  •  0.9  +  (-110  -  60)  •  0.1  =  19. 


Agent 

Ci 

Pi 

1 

30 

0.5 

2 

100 

1.0 

3 

60 

0.9 

Table  1:  A  Single  Task  Example 


Before  we  prove  the  properties  of  this  mechanism,  let 
us  introduce  two  definitions  that  we  shall  use  through¬ 
out  that  paper.  Given  an  agent  i,  we  define  the 
welfare  of  the  other  participants  W-i(A,  C-i,  p)  = 
V{p)  ~  ^,/,  rd Note  that  W_»(A,  c-i,  p)  = 
W(A,Ci,n)  +  Ci(Ai).  We  define  the  expected  welfare 
for  the  other  participants  as  W-i(A,  C-i,  (p-i,  pf))  = 
Elu,-i\A,p-i][W-i(A,C-i,(fj,i,n_i))\.  This  is  the  ex¬ 
pected  welfare  of  all  the  other  participants  (including 
the  center)  when  the  allocation  is  A  and  agent  i  has 
completed  exactly  the  set  of  tasks  defined  by  pt . 

It  is  not  difficult  to  see  that  the  payment  Ri 
of  each  agent  i  equals  W-i(A,  c_i,  (p-i,  Pi))  — 
Her  expected  utility  is  therefore 
l-H  Cj(Aj)  T  E[fj,i\Ai}pi]  bb— ?;(A,  C_.j,  (p— i,  Pif) 

WZi{c-i,p~i).  Since  the  distribution  [p\A,  (ft,P-i)] 
equals  [pz\Ai,pi]  •  [p-i\A,p_i],  we  get  that  Ri  = 
W(A,  (ci,c_i),  (Pi,p-i))  -  Wliic-^p-i).  This  means 
that  each  agent  gets  paid  her  contribution  to  the  ex¬ 
pected  welfare  of  the  other  participants. 

Theorem  1  The  Single  Task  mechanism  satisfies  IR, 
IC,  CR,  SE,  and  NFR. 

Proof: 

We  will  prove  each  property  separately. 

1.  Individual  Rationality 

We  need  to  prove  that  the  expected  utility 
of  a  truthful  agent  is  always  non  negative. 
When  agent  i  is  truthful  her  expected  util¬ 
ity  is  Ui  =  W(A((0i,0-i)),(ci,C-i),(pi,p-i))  - 
W*j(c_j,p_j).  By  the  optimality  of  A(.),  the  sec¬ 
ond  term  quantifies  the  optimal  welfare  that  can 
be  obtained  when  the  types  of  the  other  agents  are 


(9_,:  and  i  does  not  exist.  Similarly,  the  first  term 
quantifies  the  optimal  welfare  when  the  types  of 
the  other  agents  are  and  the  type  of  agent  i 
is  the  true  one  0,;.  Since  i  can  only  improve  the 
total  welfare,  we  proved  our  claim. 

2.  Incentive  Compatibility 

We  need  to  prove  that  the  expected  utility  of  each 
agent  i  is  maximized  when  she  is  truthful.  Let 
0-i  denote  the  declarations  of  the  other  agents. 
As  before,  when  the  agent  is  truthful  her  util¬ 
ity  is  Hi  =  W(A((0i,0-i)),(ci,C-i),(pi,p-i))  - 
WT(c_i,p_i).  Consider  the  case  where  the  agent 
reports  another  type  9[.  This  results  in  an  as¬ 
signment  A' .  The  utility  of  agent  i  in  this  case  is 
u'i  =  W{A',  (ci,c-i),  (pi,p-i))  - 

Assume  by  contradiction  that  v!i  >  ip. 
This  means  that  W{A',  (cj,  c_j),  (pi,p~i))  > 

W(A((0i,0-i)),(ci,C-i),(pi,p-i)).  However  this 
contradicts  the  optimality  of  A((0i,  0-i)). 

3.  Individual  Rationality  for  the  Center  Let 

agent  i  be  the  winner.  We  need  to  show  that 
the  utility  for  the  center  is  always  non  nega¬ 
tive.  There  are  two  cases.  When  agent  i  suc¬ 
ceeds  we  have  um  =  V  —  (V  —  W’*i(c_j,p_i))  = 
Wf_i(c-i,p-i))  >  0.  When  i  fails,  the  value  is  zero 
and  thus  um  =  WT{c-i,p-i))  >  0. 

4.  Social  Efficiency  Immediate  from  IC  and  the 
definition  of  A(.). 

5.  No  Free  Riders  Immediate  from  the  definition 
of  the  payment  rule.  I 

5  MULTIPLE  TASKS 

We  now  return  to  the  original  setting  presented  in  this 
paper,  consisting  of  t  tasks  for  which  the  center  has 
a  non-combinatorial  valuation  (that  is,  the  value  for  a 
set  of  tasks  is  equal  to  the  sum  of  the  values  for  the 
individual  tasks).  Because  the  setting  disallows  any 
interaction  between  tasks,  we  can  construct  a  mecha¬ 
nism  that  satisfies  all  of  our  goals  by  generalizing  the 
Single  Task  Mechanism. 

Multiple  Task  Mechanism: 

Assignment  The  chosen  assignment  A  maximizes 
the  (declared)  expected  welfare  W ( A ,  c,  p)  = 
E[n\A,p\  [W(A,c,iT)\. 

Payment  The  payment  to  each  agent  i  is  defined 
according  to  her  completion  vector:  = 

W_i{A,C_U{p_i,  Pi))  -  W*  ;(£_;,£_;) 


20 


In  other  words,  each  agent  is  paid  according  to  her 
contribution  to  the  welfare  of  the  other  participants. 

Proposition  2  The  Multiple  Task  mechanism  satis¬ 
fies  IC,  IR,  SE,  CR,  and  NFR. 

The  proof  is  similar  to  the  single  task  case  and  is  omit¬ 
ted.  Note  that  the  theorem  holds  even  when  the  cost 
functions  and  probabilities  of  success  have  a  combina¬ 
torial  nature. 

5.1  COMBINATORIAL  V 

We  now  consider  the  setting  in  which  the  center’s  valu¬ 
ation  V(-)  can  be  any  monotone  function  of  the  tasks. 
Unfortunately,  in  this  setting,  it  is  impossible  to  satisfy 
all  our  goals  simultaneously. 

Theorem  3  When  V  is  combinatorial,  there  does  not 
exist  a  mechanism  that  satisfies  IC,  IR,  CR,  and  SE 
for  any  n  >  2  and  t  >  2. 

Intuitively  it  is  enough  to  consider  the  following  case 
which  no  mechanism  is  able  to  solve.  There  are  two 
tasks,  each  of  which  can  only  be  completed  by  one 
agent  (and,  this  one  agent  is  different  for  the  two 
tasks).  The  center  only  has  a  positive  value  (call  it 
x)  for  both  tasks  being  completed.  Since  both  agents 
add  a  value  of  x  to  the  system,  they  can  each  extract 
close  to  x  from  the  center,  causing  the  center  to  pay 
double  for  the  utility  of  x  he  will  gain  from  the  com¬ 
pletion  of  the  task.  Due  to  space  constraints,  we  omit 
the  formal  proof  of  this  theorem. 

However,  despite  the  possibility  of  failures  we  can 
maintain  the  desired  properties  other  than  CR  using 
the  same  mechanism  as  before. 

Theorem  4  The  Multiple  Task  mechanism  satisfies 
IC,  IR,  SE,  and  NFR,  even  when  V  is  combinatorial. 

Again,  we  omit  the  proof.  Intuitively,  IC,  IR,  and  NFR 
are  not  affected  by  a  combinatorial  V  because  they  are 
only  properties  of  the  agents,  and  SE  still  follows  from 
IC  and  the  definition  of  A(-). 

5.2  DEPENDENCIES 

We  now  return  to  the  case  of  non-combinatorial  valu¬ 
ation  V’(-),  and  analyze  a  different  extension:  depen¬ 
dencies  between  the  tasks. 

Consider  our  motivating  example  of  a  network  of 
flights.  A  natural  example  of  a  task  dependency  would 
be  an  object  that  could  not  be  carried  over  the  edge 
( b ,  c)  before  being  carried  over  (a,  b). 

Formally,  we  say  that  a  task  j  is  dependent  on  a  set  s 
of  tasks  if  j  cannot  be  attempted  unless  all  tasks  in  s 


were  successfully  finished.  We  assume  that  there  are 
no  dependency  cycles.  The  tasks  now  are  executed  ac¬ 
cording  to  a  topological  order.  Note  that  if  a  task  can¬ 
not  be  attempted,  the  agent  assigned  that  task  does 
not  incur  the  costs  of  attempting  it.  3 

However,  the  presence  of  dependencies,  just  like  the 
presence  of  a  combinatorial  V,  makes  it  impossible  to 
satisfy  IC,  IR,  CR,  and  SE. 

Theorem  5  When  dependencies  exist  between  tasks, 
there  does  not  exist  a  mechanism  that  satisfies  IC,  IR, 
CR,  and  SE  for  any  n>  2  and  t  >  2. 

Proof:  Proof  by  induction.  We  first  show  that  a 
mechanism  cannot  satisfy  IC,  IR,  CR,  and  SE  for  the 
base  case  of  =  t  =  2.  The  inductive  step  then  shows 
that  increasing  either  n  or  t  cannot  alter  this  impossi¬ 
bility  result. 

Base  Case:  We  prove  the  base  case  by  contradiction. 
Assume  that  there  exists  a  mechanism  that  satisfies 
IC,  IR,  CR,  and  SE.  This  implies  that  it  satisfies  these 
properties  for  all  possible  instances,  where  we  define  a 
instance  as  a  particular  set  of  agent  types  and  decla¬ 
rations,  task  dependencies,  and  a  V  function.  We  will 
use  3  possible  instances  in  order  to  derive  properties 
that  must  hold  in  the  mechanism,  but  lead  to  a  con¬ 
tradiction.  The  constants  in  these  instances  are  that 
task  2  is  dependent  on  task  1  and  that  the  center  has 
value  of  5  for  task  2  being  completed,  but  no  value  for 
the  completion  of  task  1  in  isolation.  The  five  types 
that  we  will  use,  0i,  0j,  0",  62 ,  and  02,  are  defined  in 
Table  2  (the  final  type,  9e,  will  be  used  later  in  the 
inductive  step). 


01 

Pll  =  1 

II 

to 

P12  =  1 

Cl2  =  1 

O'l 

Pll  =  1 

II 

to 

Pi  2  =  0 

O 

II 

9’f 

Pll  =  1 

0 

II 

P12  =  1 

II 

9-2 

P21  =  0 

C21  =  1 

P22  =  0 

O 

II 

<N 

8*2 

P21  =  1 

c 21  =  1 

P22 =  0 

c22  —  0 

9  e 

Pel  =  0 

Cel  —  1 

Pe  2  =  0 

Ce  2  =  1 

Table  2:  Agent  Types  for  Theorem  5. 


Instance  1:  The  true  types  are  0i  and  9 2,  and  the 
declared  types  are  9\  and  9'2.  To  satisfy  SE,  task  1  is 
assigned  to  agent  2,  and  task  2  to  agent  1.  That  is, 
Ai{9\,9'2)  =  (0,1)  and  A2(0i,02)  =  (1,0).  Since  agent 
2’s  true  type  is  02,  she  will  fail  on  task  1,  preventing 
task  2  from  being  attempted.  Thus,  p  =  (0,  0)  with 
probability  1.  The  expected  utility  for  agent  1  is  then: 

=  i?r((01,^),(O,O)) 

3This  is  the  reason  why  the  current  setting  is  not  a  spe¬ 
cial  case  of  the  combinatorial  V  setting  where  the  valuation 
of  a  set  of  tasks  is  the  valuation  of  the  subset  for  which  the 
prerequisites  are  met. 
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Instance  2:  The  true  types  are  9[  and  02,  and  the  de¬ 
clared  types  are  9 i,  and  9'2  ■  Thus,  the  only  difference 
from  instance  1  is  agent  l’s  true  type  which  is  insignif¬ 
icant,  because  agent  1  never  gets  to  attempt  a  task. 
Thus,  we  have  a  similar  expected  utility  function: 

u1(ci,«1.^,(pi,p2))  =  -R1((fl1,^),(0,0)) 

Instance  3:  The  true  types  are  9[  and  02,  and  the  de¬ 
clared  types  are  9[,  and  9'2.  Now  we  have  also  changed 
agent  l’s  declared  type  to  9[.  Both  tasks  will  be  al¬ 
located  to  the  null  agent:  Ai(0'1;02)  =  A2(0(,02)  = 
(0,0).  Therefore,  p  =  (0,0)  still  holds  with  probabil¬ 
ity  1,  and  we  get  the  following  equations: 

«1(c'1,0'1,0',(p;,p2))  =  i?i((0i,^),(o,o)) 

«2(C2,0^,0i,(p,1,P2))=i?2((0i,^),(O,O)) 

If  j??2((6,i,  ^2),  (0, 0))  <  0,  then  IR  would  be  violated 
if  02  were  indeed  the  true  type  of  agent  2,  because 
the  assignment  function  would  be  the  same.  Since  the 
center  thus  receives  no  payment  from  agent  2,  and  it 
never  gains  any  utility  from  completed  tasks,  the  CR 
condition  requires  that  i?i((0(,  92),  (0, 0))  <  0.  Thus, 
Wi(ci,0i,02,(p,i,P2))  <  0. 

Notice  that  if  agent  1  lied  in  this  instance  and  de¬ 
clared  her  type  to  be  0 1,  then  we  are  in  instance  2. 
So,  to  preserve  IC,  agent  1  must  not  have  incentive 
to  make  this  false  declaration.  Si(ci,  0i,  92,  (p[,P2))  = 
R1((01,0^),(O,O))  <u1(c/i,9'1,9'2,(p'1,p2))  <0. 

Instance  1:  Now  we  return  to  instance  1.  Having 
shown  that  Ri((9i,  92),  (0,  0))  <  0,  we  know  that  when 
agent  1  declares  truthfully  in  this  instance,  her  ex¬ 
pected  utility  will  be:  wi(c/1, 0i,  9'2,p)  <  0. 

We  will  now  show  that  agent  1  must  have  a  pos¬ 
itive  expected  utility  if  she  falsely  declares  0". 
In  this  case,  both  tasks  are  assigned  to  agent 
1.  That  is,  Hi(0",02)  =  (1,1)-  We  know  that 
Ri{{9" ,  0^),  (1, 1))  >  4  by  IR  for  agent  1,  because  if 
0"  were  agent  l’s  true  type,  then  both  tasks  would  be 
completed  and  agent  1  would  incur  a  cost  of  4. 

We  now  know  that  if  agent  1  falsely  declares  0"  in  in¬ 
stance  1:  ui(ci,0",02,p)  =  i?i((0",02),(l,l))-(cn  + 
C12)  >  4  — 3  >  1.  Thus,  agent  1  has  incentive  to  falsely 
declare  0"  in  instance  1,  violating  IC.  Thus,  we  have 
reached  a  contradiction  and  completed  the  proof  of  the 
base  case. 

Inductive  Step:  We  now  prove  the  inductive  step, 
which  consists  of  two  parts:  incrementing  n  and  in¬ 
crementing  t.  In  each  case,  the  inductive  hypothesis 
is  that  no  mechanism  satisfies  IC,  IR,  CR,  and  SE  for 
n  =  x  and  t  =  y,  where  x,y  >  2. 

Part  1:  For  the  first  case,  we  must  show  that  no 
mechanism  exists  that  satisfies  IC,  IR,  CR,  and  SE 


for  n  =  x  +  1  and  t  =  y,  which  we  will  prove  by  con¬ 
tradiction.  Assume  that  such  a  mechanism  does  exist. 
There  is  a  one-to-one  mapping  from  every  instance  in 
which  n  =  x  and  t  =  y  to  an  instance  that  only  dif¬ 
fers  in  the  addition  of  an  “extra”  agent  who  truthfully 
declares  her  type  0e.  Since  such  an  instance  satisfies 
n  =  x  +  1  and  t  =  y,  our  mechanism  must  satisfy 
IC,  IR,  CR,  and  SE  for  this  instance.  Because  of  SE, 
this  mechanism  can  never  assign  the  task  to  the  extra 
agent.  Because  of  IR,  this  mechanism  can  never  re¬ 
ceive  a  positive  payment  from  the  extra  agent.  Since 
in  each  instance  the  only  effect  that  the  extra  agent  can 
have  on  the  mechanism  is  to  receive  a  payment  from 
the  center,  we  can  transform  this  mechanism  into  one 
which  satisfies  IC,  IR,  CR,  and  SE  for  all  instances 
where  n  =  x  and  t.  =  y  by  simply  removing  the  rev¬ 
enue  function  and  assignment  function  for  the  extra 
agent,  contradicting  the  inductive  hypothesis. 

Part  2:  For  the  second  case,  we  need  to  show  that  no 
mechanism  can  satisfy  IC,  IR,  CR,  and  SE  for  n  =  x 
and  t  =  y  + 1.  We  use  a  similar  proof  by  contradiction, 
starting  from  the  assumption  that  such  a  mechanism 
does  exist.  There  is  a  one-to-one  mapping  from  every 
instance  in  which  n  =  x  and  t  =  y  to  an  instance  of 
n  =  x  and  t  =  y+ 1  through  the  addition  of  an  “extra” 
task  te  that  is  not  involved  in  any  dependencies  and  for 
which  the  center  has  no  value  for  its  completion.  By 
SE,  if  a  satisfying  mechanism  exists,  then  there  exists 
a  satisfying  mechanism  that  always  assigns  this  task  to 
the  dummy  agent  (recall  that  this  is  equivalent  to  not 
assigning  the  task) .  We  can  transform  this  mechanism 
into  one  which  satisfies  our  goals  for  n  =  x  and  t  =  y 
by  simply  removing  the  assignment  of  te  to  the  dummy 
agent.  Once  again,  we  have  contradicted  the  inductive 
hypothesis,  and  the  proof  is  complete.  ■ 

Interestingly,  by  slightly  altering  our  mechanism  we 
can  solve  the  problem  in  an  equilibrium. 

Equilibrium  Multiple  Task  Mechanism: 

Assignment  The  chosen  assignment  A  maximizes 
the  (declared)  expected  welfare  W (A,  c,  p)  = 
Eln\A,p\  f W(A,c,p )]. 

Payment  The  payment  to  each  agent  i  is  defined 
according  to  her  completion  vector:  = 

W-i{A,c-i,p)  -  Wl^c-i^p-i) 

The  only  difference  from  the  Multiple  Task  Mecha¬ 
nism  is  that  the  first  term  of  the  payment  rule  uses 
the  actual  completion  vector,  instead  of  the  distribu¬ 
tion  induced  by  the  declaration  of  the  other  agents.  As 
a  result,  our  properties  are  satisfied  only  as  an  equilib¬ 
rium:  if  all  agents  declare  truthfully,  then  no  agent  has 
incentive  to  deviate  to  a  different  declaration.  While 
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there  is  no  formal  name  for  this  type  of  equilibrium,  it 
is  similar  in  spirit  to  a  Nash  equilibrium,  but  techni¬ 
cally  different  because  there  is  no  common  knowledge 
of  the  game  (since  privately-known  types  affect  the 
utility  of  other  agents).  It  is  also  similar  to  a  Bayes- 
Nash  equilibrium,  but  much  stronger  because  it  holds 
regardless  of  agent  beliefs  about  the  prior  distributions 
for  the  types  of  the  other  agents.  We  define  Equilib¬ 
rium  IC  to  hold  if  truth-telling  is  such  an  equilibrium. 
Equilibrium  IR  and  SE  are  defined  similarly. 

Theorem  6  The  Equilibrium  Multiple  Task  Mecha¬ 
nism  satisfies  NFR,  Equilibrium  IC,  Equilibrium  IR, 
and  Equilibrium  SE,  even  when  dependencies  exist. 

6  COST  VERIFICATION 

A  practical  drawback  of  our  mechanisms  is  that  the 
payments  (or  fines)  may  be  very  large,  especially  when 
service  provider  is  far  more  efficient  than  the  other 
agents.  Also,  since  CR  is  not  satisfied  in  our  most 
general  settings,  the  designer  has  to  take  a  risk. 

Previous  work  (Nisan  &  Ronen  2001)  has  stressed  the 
importance  of  ex  post  verification.  It  showed  that 
when  the  designer  can  verify  the  costs  and  the  actions 
of  the  agents  after  the  work  was  done,  the  power  of 
the  designer  increases  dramatically.  All  of  our  previ¬ 
ous  constructions  have  corresponding  versions  that  use 
verification.  The  main  advantage  of  these  mechanisms 
is  that  the  payments  can  be  normalized  by  any  linear 
function,  thus  making  the  potential  losses  more  rea¬ 
sonable  for  both  the  agents  and  the  center.  Due  to 
space  constraints  we  omit  these  constructions. 

7  DISCUSSION  &  FUTURE  WORK 

In  this  paper  we  studied  task  allocation  problems  in 
which  agents  may  fail  to  complete  their  assigned  tasks. 
For  the  settings  we  considered  (single  task,  multi¬ 
ple  tasks  with  combinatorial  properties,  and  multiple 
tasks  with  dependencies)  we  provided  either  a  mecha¬ 
nism  that  satisfies  our  goals  or  an  impossibility  result. 

It  is  worth  pointing  out  that  all  of  the  results  in  this  pa¬ 
per  hold  when  we  expand  the  set  of  possible  failures  to 
include  rational,  intentional  failures,  which  occur  when 
an  agent  increases  her  utility  by  not  attempting  an  as¬ 
signed  task  (and  thus  not  incurring  the  corresponding 
cost).  Modelling  this  possibility  would  complicate  our 
model  without  changing  any  of  our  results.  Intuitively, 
our  positive  results  continue  to  hold  because  the  pay¬ 
ment  rule  aligns  an  agent’s  utility  with  the  welfare  of 
the  system.  If  failing  to  attempt  some  subset  of  the 
assigned  tasks  would  increase  the  welfare,  then  these 
tasks  would  not  have  been  assigned  to  any  agent.  Ob¬ 


viously,  all  impossibility  results  would  still  hold  when 
we  expand  the  set  of  possible  actions  for  the  agents. 

Many  interesting  directions  stem  from  this  work.  Two 
possibilities  are  retrying  tasks  after  a  failure  or  allow¬ 
ing  multiple  agents  to  attempt  the  same  task  in  par¬ 
allel.  The  computation  of  our  allocation  and  payment 
rules  presents  non-trivial  algorithmic  problems.  Also, 
the  payment  properties  for  the  center  may  be  further 
investigated,  especially  in  settings  where  CR  must  be 
sacrificed  to  satisfy  our  other  goals. 

Finally,  we  believe  that  the  most  important  future 
work  will  be  to  consider  a  wider  range  of  possible 
failures,  and  to  discover  new  mechanisms  to  overcome 
them.  In  particular,  we  would  like  to  explore  the  case 
in  which  agents  may  fail  maliciously  or  irrationally. 
For  this  case,  even  developing  a  reasonable  model  of 
the  setting  provides  a  major  challenge. 
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Bidding  Clubs  in  First-Price  Auctions* 

Kevin  Leyton-Brown  Yoav  Shoham  Moshe  Tennenholtz 


Abstract 

We  introduce  a  class  of  mechanisms,  called  bidding  clubs,  that  allow 
agents  to  coordinate  their  bidding  in  auctions.  Bidding  clubs  invite  a  set 
of  agents  to  join,  and  each  invited  agent  freely  chooses  whether  to  accept 
the  invitation  or  to  participate  independently  in  the  auction.  Agents 
who  join  a  bidding  club  first  conduct  a  “knockout  auction”  within  the 
club;  depending  on  the  outcome  of  the  knockout  auction  some  subset  of 
the  members  of  the  club  bid  in  the  primary  auction  in  a  prescribed  way. 
We  model  this  setting  as  a  Bayesian  game,  including  agents’  choices  of 
whether  or  not  to  accept  a  bidding  club’s  invitation.  After  describing  this 
general  setting,  we  examine  the  specific  case  of  bidding  clubs  for  first- 
price  auctions.  We  show  the  existence  of  a  Bayes-Nash  equilibrium  where 
agents  choose  to  participate  in  bidding  clubs  when  invited  and  truthfully 
declare  their  valuations  to  the  coordinator.  Furthermore,  we  show  that 
the  existence  of  bidding  clubs  benefits  all  agents  (both  inside  and  outside 
of  bidding  clubs) . 


1  Introduction 

Most  work  on  auctions  concentrates  on  the  design  of  auction  protocols  from  the 
seller’s  perspective,  and  in  particular  on  optimal  (i.e.,  revenue  maximizing)  auc¬ 
tion  design.  In  this  paper  we  present  a  class  of  systems  to  assist  sets  of  bidders, 
bidding  clubs.  The  idea  is  similar  to  the  idea  behind  “buyer  clubs” :  aggregating 
the  market  power  of  individual  bidders.  Buyer  clubs  work  when  buyers’  inter¬ 
ests  are  perfectly  aligned;  the  more  buyers  join  in  a  purchase  the  lower  the  price 
for  everyone.  In  auctions  it  is  relatively  easy  for  multiple  agents  to  cooperate, 
hiding  behind  a  single  auction  participant.  Intuitively,  these  bidders  can  reduce 
their  payment  if  they  win,  by  causing  others  to  lower  their  bids  in  the  case  of  a 
first-price  auction  or  by  possibly  removing  the  second-highest  bidder  in  the  case 
of  a  second-price  auction.  However,  the  situation  in  auctions  is  not  as  simple 
as  in  buyer  clubs,  because  while  bidders  can  gain  by  sharing  information,  the 
competitive  nature  of  auctions  means  that  bidders’  interests  are  not  aligned. 
Thus  there  is  a  complex  strategic  relationship  among  bidders  in  a  bidding  club, 
and  bidding  club  rules  must  be  designed  accordingly. 

*  Thanks  to  Navin  Bhat  and  Ryan  Porter  for  very  helpful  discussions  about  Theorem  3.  This 
work  was  supported  by  DARPA  grant  F30602-00- 2-0598  and  a  Stanford  Graduate  Fellowship. 
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1.1  Related  Work 

Below  we  discuss  the  most  relevant  previous  work  and  its  relation  to  ours,  not¬ 
ing  the  relative  scarcity  of  previous  work  on  bidder-centric  mechanisms.  This 
work  all  comes  under  the  umbrella  of  self-enforcing  collusive  protocols  for  non- 
repeated  auctions.  Collusion  is  a  negative  term  reflecting  a  seller-oriented  per¬ 
spective;  since  we  adopt  a  more  neutral  stance  towards  such  bidder  activities, 
we  use  the  term  bidding  clubs  rather  than  the  terms  bidding  rings  and  cartels 
that  have  been  used  in  the  past.  However,  the  technical  development  is  not 
impacted  by  such  subtle  differences  in  moral  attitude. 

1.1.1  Collusion  in  Second-Price  Auctions 

One  of  the  first  formal  papers  to  consider  collusion  in  second-price  auctions 
was  written  by  Graham  and  Marshall  [3].  This  paper  introduces  the  knockout 
procedure:  agents  announce  their  bids  in  a  knockout  auction;  only  the  highest 
bidder  goes  to  the  auction  but  this  bidder  must  pay  a  “ring  center”  the  amount 
of  his  gain  relative  to  the  case  where  there  was  no  collusion.  The  ring  center 
pays  each  agent  in  advance;  the  amount  of  this  payment  is  calculated  so  that  the 
ring  center  will  budget-balance  ex-ante ,  before  knowing  the  agents’  valuations. 

Graham  and  Marshall’s  work  has  been  extended  to  deal  with  variations  in  the 
knockout  procedure,  differential  payments,  and  relations  to  the  Shapley  value 
[4] .  The  case  where  only  some  of  the  agents  are  part  of  the  cartel  is  discussed  by 
Mailath  and  Zemsky  [9].  Ungern  and  Sternberg  [14]  discuss  collusion  in  second- 
price  auctions  where  the  designated  winner  of  a  cartel  is  not  the  agent  with 
the  highest  valuation.  Although  not  presented  in  any  existing  work  of  which 
we  are  aware,  it  is  also  easy  to  extend  Graham  and  Marshall’s  protocol  to  an 
environment  where  multiple  cartels  may  operate  in  the  same  auction  alongside 
independent  bidders. 

1.1.2  Collusion  in  First-Price  Auctions 

There  is  little  formal  work  on  collusion  in  first-price  auctions,  the  most  impor¬ 
tant  exception  being  a  very  influential  paper  by  McAfee  and  McMillan  [11]. 
It  is  the  closest  in  the  literature  to  our  work,  and  indeed  we  have  borrowed 
some  modelling  elements  from  it.  Several  sections,  including  the  discussion  of 
enforcement  and  the  argument  for  independent  private  values  as  a  model  of 
agents’  valuations,  are  directly  applicable  to  our  paper.  However,  the  setting 
introduced  in  their  work  assumes  that  a  fixed  number  of  agents  participate  in 
the  auction  and  that  all  agents  are  part  of  a  single  cartel  that  coordinates  its  be¬ 
havior  in  the  auction.  The  authors  show  optimal  collusion  protocols  for  “weak” 
cartels  (in  which  transfers  between  agents  are  not  permitted:  all  bidders  bid  the 
reserve  price,  using  the  auctioneer’s  tie-breaking  rule  to  randomly  select  a  win¬ 
ner)  and  for  “strong”  cartels  (the  cartel  holds  a  knockout  auction,  the  winner  of 
which  bids  the  reserve  price  in  the  main  auction  while  all  other  bidders  sit  out; 
the  winner  distributes  some  of  his  gains  to  other  cartel  members  through  side 
payments).  A  small  part  of  the  paper  deals  with  the  case  where  in  addition  to 
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a  single  cartel  there  are  also  additional  agents.  However,  results  are  shown  only 
for  two  cases:  (1)  when  non-cartel  members  bid  without  taking  the  existence  of 
a  cartel  into  account  and  (2)  when  each  agent  i  has  valuation  Uj  €  {0, 1}.  The 
authors  explain  that  they  do  not  attempt  to  deal  with  general  strategic  behavior 
in  the  case  where  the  cartel  consists  of  only  a  subset  of  the  agents;  furthermore, 
they  do  not  consider  the  case  where  multiple  cartels  can  operate  in  the  same 
auction.  Finally,  a  brief  presentation  of  “cartel-formation  games”  is  related  to 
our  discussion  of  agents’  decision  of  whether  or  not  to  accept  an  invitation  to 
join  a  bidding  club. 

1.1.3  Other  Work  on  Collusion 

Less  formal  discussion  of  collusion  in  auctions  can  be  found  in  a  wider  variety 
of  papers.  For  example,  a  survey  paper  that  discusses  mechanisms  that  are 
likely  to  facilitate  collusion  in  auctions,  as  well  as  methods  for  the  detection  of 
such  schemes,  can  be  found  in  [6] .  A  discussion  and  comparison  of  the  stability 
of  rings  associated  with  classical  auctions  can  be  found  in  [13],  concentrating 
on  the  case  where  the  valuations  of  agents  in  the  cartel  are  honestly  reported. 
Collusion  is  also  discussed  in  other  settings,  e.g.,  aiming  to  influence  purchaser 
behavior  in  a  repeated  procurement  setting  (see  [2])  and  in  the  context  of  general 
Bertrand  or  Cournot  competition  (see  [1]). 

Our  previously  published  work  anticipates  some  of  the  results  reported  here. 
Specifically,  in  [7]  we  considered  bidding  clubs  under  the  assumptions  that  only  a 
single  bidding  club  exists,  and  that  bidders  who  were  not  invited  to  join  the  club 
are  not  aware  of  the  possibility  that  a  bidding  club  might  exist.  The  current 
paper  is  an  extension  and  generalization  of  that  earlier  work.  An  extended 
abstract  of  the  current  paper  appeared  in  AAAI-02  [8] . 

2  Technical  Preliminaries 

Our  goal  is  to  extend  on  past  work  on  bidder  cooperation  in  first-price  auctions 
to  the  standard  game-theoretic  setting  in  which  all  agents  (both  cartel  members 
and  non-members)  are  rational,  and  act  in  equilibrium  based  on  true  knowledge 
of  the  economic  environment.  We  also  want  to  increase  realism  by  allowing 
for  the  possibilities  that  more  than  one  cartel  will  exist  (introducing  the  new 
wrinkle  that  cartel  members  must  reason  about  the  behavior  of  other  cartels) 
and  that  some  agents  will  not  belong  to  any  cartel.  Of  course  we  also  want  to 
allow  for  real-numbered  valuations  drawn  from  an  interval,  as  compared  to  the 
case  studied  in  [11]  where  valuations  take  one  of  only  two  discrete  values. 

2.1  Auction  Setting 

In  this  section  we  give  a  formal  description  of  the  auction  setting  and  introduce 
notation.  An  economic  environment  E  consists  of  a  finite  set  of  agents  who 
have  non-negative  valuations  for  a  good  at  auction,  and  a  distinguished  agent 
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0 — the  seller  or  center.  Denote  the  economic  environment  described  here  as  Ec. 
Let  T  be  the  set  of  possible  agent  types.  The  type  r*  £  T  of  agent  i  is  the  pair 
(■ Vi,  Si )  £  V  x  S.  i>i  denotes  an  agent’s  valuation:  his  maximal  willingness  to 
pay  for  the  good  offered  by  the  center.  We  assume  that  ty  represents  a  purely 
private  valuation  for  the  good,  and  that  ty  is  selected  independently  from  the 
other  Vj’s  of  other  agents  from  a  known  distribution,  F,  having  density  function 
/.  By  Si  we  denote  agent  V s  signal:  his  private  information  about  the  number 
of  agents  in  the  auction.  The  set  of  possible  signals  will  be  varied  throughout 
the  paper;  in  Ec  let  S  =  {0}.  Note,  however,  that  the  economic  environment 
itself  is  always  common  knowledge,  and  so  agents  always  have  some  information 
about  the  number  of  agents  even  when  they  always  receive  the  null  signal. 

By  Pn  we  denote  the  probability  that  agent  i  assigns  to  there  being  exactly 
n  agents  in  the  auction,  conditioned  on  his  type  iy.  We  denote  the  whole 
distribution  conditioned  on  i’s  type  as  uppercase  PTi .  The  utility  function  of 
agent  i,  Ui  :  R.  — >  R.  is  linear,  normalized  with  0)  =  0.  The  utility  of  agent 
i  (having  valuation  ry)  when  asked  to  pay  t  is  ty  —  t  if  i  is  allocated  a  good, 
and  it  is  0  otherwise.  Thus,  we  assume  that  there  are  no  externalities  in  agents’ 
valuations  and  that  agents  are  risk-neutral,  bi  :  T  — >  R  denotes  agent  i’s 
strategy,  a  mapping  from  i’s  type  Tt  to  his  declaration  in  the  auction.  This  may 
be  the  null  declaration,  indicating  that  i  will  not  participate  in  the  auction. 

2.2  Classical  first-price  auctions 

It  is  instructive  to  consider  the  reasons  why  most  previous  work  in  collusion 
has  focused  on  second-price  rather  than  first-price  auctions.  Since  second-price 
auctions  give  rise  to  dominant  strategies,  and  since  colluding  agents  can  gain  by 
having  other  agents  drop  out  without  changing  their  own  bidding  behavior,  it 
is  possible  to  study  collusion  in  many  settings  related  to  these  auctions  without 
performing  strategic  equilibrium  analysis.  In  particular,  agents  outside  a  cartel 
have  no  reason  to  change  their  strategies  if  they  suspect  (or  even  know)  that 
collusion  is  taking  place.  In  first-price  auctions  agents  who  are  not  part  of  the 
cartel  must  take  into  account  the  likelihood  of  collusion  in  deciding  what  they 
should  bid,  since  their  strategy  amounts  to  predicting  the  second-highest  bid 
conditional  on  their  bid  being  highest,  and  this  computation  depends  on  the 
total  number  of  agents.  The  settings  in  [11]  are  largely  designed  to  overcome 
this  problem:  e.g.,  if  all  agents  belong  to  the  cartel,  or  if  non-cartel  agents  are 
assumed  to  play  as  though  collusion  is  impossible,  the  question  of  how  cartel 
members  and  non-cartel  members  reason  about  each  other  is  avoided. 

This  suggests  that  the  choice  of  information  structure  will  make  a  real  differ¬ 
ence  for  the  study  of  collusion  in  first-price  auctions.  The  most  familiar  is  what 
we  will  call  the  “classical”  first-price  auction,  where  the  number  of  participants 
is  part  of  the  economic  environment  (as  in  Ec).  The  equilibrium  analysis  of 
classical  first-price  auctions  is  quite  standard1: 

1When  we  say  that  n  agents  participate  in  the  auction  we  do  not  count  the  distinguished 
agent  0,  who  is  always  present. 
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Proposition  1  If  valuations  are  selected  independently  according  to  the  uni¬ 
form  distribution  on  [0, 1]  then  it  is  a  unique  symmetric  equilibrium  for  each 
agent  i  to  follow  the  strategy: 


u  x  n-l 

b(Vi)  =  - Vi. 

n 

Using  classical  equilibrium  analysis  (e.g.,  following  Riley  and  Samuelson  [12]) 
classical  first-price  auctions  can  be  generalized  to  an  arbitrary  continuous  dis¬ 
tribution  F . 

Proposition  2  If  valuations  are  selected  from  a  continuous  distribution  F  then 
it  is  a  unique  symmetric  equilibrium  for  each  agent  i  to  follow  the  strategy: 

b(Vi )  =  vt  -  F{Vi)-(n-V  [  '  F(u)n~1du. 

J  o 


In  both  cases,  observe  that  the  strategy  is  parameterized  by  valuation,  and 
also  depends  on  information  from  the  economic  environment.  It  will  be  nota- 
tionally  useful  for  us  to  be  able  to  specify  the  amount  of  the  equilibrium  bid  as 
a  function  of  both  v  and  n: 


be{vh  n )  =  Vi  -  P(^)-(n-1}  [  ‘  F{u)n^du.  (1) 

Jo 

We  are  interested  in  constructing  a  bidding  club  protocol:  a  collusive  agree¬ 
ment  that  requires  low  bidders  to  drop  out  of  the  main  auction  if  they  lose  in  a 
knockout  auction.  It  is  immediately  obvious  that  such  collusion  is  nonsensical 
in  a  classical  first-price  auction.  Since  all  agents  have  full  knowledge  of  the 
economic  environment,  they  all  know  the  true  number  of  agents;  as  a  result,  it 
will  not  matter  to  agents  outside  a  cartel  if  cartel  members  with  low  valuations 
drop  out,  and  so  the  original  equilibrium  (based  on  the  true  number  of  agents  in 
the  environment)  will  still  hold.  This  seems  more  of  a  problem  with  our  auction 
model  than  with  collusion  in  first-price  auctions  per  se — in  practice  bidders  do 
not  know  how  many  agents  have  declined  to  participate,  because  they  don’t 
actually  know  the  number  of  agents  in  the  economic  environment.  The  next 
section  considers  an  economic  environment  that  addresses  this  issue. 

2.3  First-price  auctions  with  stochastic  number  of  bidders 

One  way  of  modelling  agents’  uncertainty  about  the  number  of  opponents  they 
face  is  to  say  that  the  number  of  participants  is  chosen  stochastically  from  a 
probability  distribution,  and  while  the  number  of  participants  is  not  known  to 
the  individual  agents  (not  being  part  of  the  economic  environment)  the  dis¬ 
tribution  is  commonly  known  [10].  This  setting  requires  that  we  redefine  the 
economic  environment;  denote  the  new  economic  environment  as  Es.  Let  the 
set  of  agents  who  may  participate  in  the  economic  environment  be  A  =  N.  Let 


5 


28 


/3a  represent  the  probability  that  a  finite  set  A  C  A  is  the  set  of  agents.  The 
probability  that  n  agents  will  participate  in  the  auction  is  7 a(ti)  =  ^.4  \A\=n  Pa- 
All  agents  know  the  probability  distribution  (3a •  Once  an  agent  k  is  selected, 
he  updates  his  probability  of  the  number  of  agents  present  as: 

k  Ea,|A|— n,fcgA  Pa 

rn  V-''  n  ’  \^ ) 

2^A,keAPA 

We  deviate  from  the  model  in  [10]  by  adding  the  assumption  that  all  bidders 
are  equally  likely  to  be  chosen.  Hence  is  the  same  for  all  k:  we  will  hereafter 
refer  only  to  pn.  Finally,  we  assume  that  7a (0)  =  7a (1)  =  0;  at  least  two  agents 
will  participate  in  the  auction. 

An  equilibrium  for  this  setting  was  demonstrated  by  Harstad,  Kagel  and 
Levin  [5]: 


Proposition  3  If  valuations  are  selected  from  a  continuous  distribution  F  and 
the  number  of  bidders  is  selected  from  the  distribution  P  then  it  is  a  unique 
symmetric  equilibrium  for  each  agent  i  to  follow  the  strategy: 


Kvi )  = 

3 


Observe  that  be(vi,j)  is  the  amount  of  the  equilibrium  bid  for  a  bidder  with 
valuation  17  in  a  setting  with  j  bidders  as  described  in  section  2.2  above.  P  is 
deduced  from  the  economic  environment.2  We  overload  our  previous  notation 
for  the  equilibrium  bid,  this  time  as  a  function  of  the  agent’s  valuation  and  the 
probability  distribution  P: 


be(Vi1p)  =  Y. 

3 


FJ  l(vj)Pj 

Efe-Ffc_1(^)Pfe 


be{vi,j ) 


(3) 


Unfortunately,  this  auction  model  is  still  not  rich  enough  to  express  our  in¬ 
tuition  about  how  agents  could  collude  in  a  first-price  auction.  If  each  agent 
knows  only  a  distribution  on  the  total  number  of  agents  interested  in  partici¬ 
pating  in  the  auction,  then  he  has  no  way  of  knowing  that  other  agents  have 
dropped  out!  It  seems  reasonable  that  agents  will  sometimes  know  how  many 
agents  are  placing  bids  in  the  auction,  even  though  they  may  not  know  the 
number  of  agents  who  chose  not  to  participate  at  all.  For  example,  when  an 
auction  takes  place  in  an  auction  hall,  no  bidder  knows  how  many  potential 
bidders  stayed  home,  but  every  bidder  can  count  the  number  of  people  in  the 
room  before  placing  his  or  her  bid.  It  is  in  this  sort  of  auction  that  we  could 
hope  collusion  based  on  dropping  agents  with  low  valuations  would  work.  We 
must  first  introduce  a  new  type  of  auction  to  model  this  auction  hall  scenario. 

2Recall  that  P  is  a  set:  Pj  £  P  for  all  j  >  0,  where  pj  denotes  the  probability  that  the 
economic  environment  contains  exactly  j  agents. 
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2.4  First-price  auctions  with  participation  revelation 

First-price  auctions  with  participation  revelation  are  defined  as  follows: 

1.  Agents  indicate  their  intention  to  bid  in  the  auction. 

2.  The  auctioneer  announces  n,  the  number  of  agents  who  registered  in  the 
first  phase. 

3.  Agents  submit  bids  to  the  auctioneer.  The  auctioneer  will  only  accept 
bids  from  agents  who  registered  in  the  first  phase. 

4.  The  agent  who  submitted  the  highest  bid  is  awarded  the  good  for  the 
amount  of  his  bid;  all  other  agents  are  made  to  pay  0. 

When  a  first-price  auction  with  participation  revelation  operates  in  Es,  the 
equilibrium  of  the  corresponding  classical  first-price  auction  holds. 

Proposition  4  In  Es  it  is  an  equilibrium  of  the  first-price  auction  with  partic¬ 
ipation  revelation  for  every  agent  i  to  indicate  the  intention  to  participate  and 
to  bid  according  to  be(vi,n). 

Proof.  Agents  are  always  better  off  participating  in  first-price  auctions 
as  long  as  there  is  no  participation  fee.  The  only  way  of  participating  is  to 
declare  the  intention  to  participate  in  the  first  phase  of  the  auction.  Thus  the 
number  of  agents  announced  by  the  auctioneer  is  equal  to  the  total  number  of 
agents  in  the  economic  environment.  From  proposition  2  it  is  best  for  agent  i 
to  bid  be(vi,n)  when  it  is  common  knowledge  that  the  number  of  agents  in  the 
economic  environment  is  n.  I 

Settings  modelled  using  classical  first-price  auctions  may  often  be  more  ap¬ 
propriately  modelled  as  first-price  auctions  with  participation  revelation,  since 
bidders  rarely  know  a  priori  the  number  of  opponents  they  will  face.  However, 
when  bidders  are  unable  to  collude  there  is  no  strategic  difference  between  these 
two  mechanisms,  explaining  why  the  simpler  classical  model  is  commonly  used. 
For  the  study  of  bidding  clubs,  however,  the  difference  between  the  mechanisms 
is  profound — we  are  now  able  to  make  the  standard  assumption  that  bidders 
have  complete  knowledge  of  the  economic  environment,  while  still  finding  that 
bidder  strategies  are  affected  by  the  number  of  other  agents  who  indicate  an 
intention  to  participate  in  the  auction. 

2.5  Distinguishing  Features  of  our  Model 

Having  justified  our  setting,  it  is  worthwhile  to  emphasize  the  main  differences 
between  our  model  of  collusion  and  models  proposed  in  the  work  surveyed  above 
(particularly  [4]  and  [11]): 

1.  The  number  of  bidders  is  stochastic. 
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2.  There  is  no  minimum  number  of  bidders  in  a  bidding  club  (e.g.,  bidding 
clubs  are  not  required  to  contain  all  bidders).3 

3.  There  is  no  limit  to  the  number  of  bidding  clubs  in  a  single  auction. 

4.  Club  members  and  independent  bidders  behave  strategically,  acting  ac¬ 
cording  to  correct  beliefs  about  their  environment. 

Additionally,  we  make  several  restrictions  on  the  bidding  club  protocols  that 
we  are  willing  to  consider.  None  of  these  is  required  for  the  construction  of  a 
working  protocol,  but  we  feel  that  each  of  these  characteristics  is  necessary  for 
bidding  clubs  to  be  a  realistic  model  of  bidder  cooperation: 

1.  Participation  in  bidding  clubs  requires  an  invitation,  but  bidders  must  be 
free  to  decline  this  invitation  without  (direct)  penalty.  In  this  way  we 
include  the  choice  to  collude  as  one  of  agents’  strategic  decisions,  rather 
than  starting  from  the  assumption  that  agents  will  collude. 

2.  Bidding  club  coordinators  must  make  money  on  expectation.  This  ensures 
that  third-parties  have  incentive  to  run  bidding  club  coordinators. 

3.  The  bidding  club  protocol  must  give  rise  to  an  equilibrium  where  all  invited 
agents  choose  to  participate,  even  when  the  bidding  club  operates  in  a 
single  auction  as  opposed  to  a  sequence  of  auctions.  This  means  that 
agents  can  not  be  induced  to  collude  in  a  given  auction  by  the  threat  of 
being  denied  future  opportunities  to  collude. 

2.6  Overview 

Section  3  expands  the  auction  models  and  economic  environments  described 
above  to  the  bidding  club  setting.  Section  4  examines  bidding  club  protocols 
for  first-price  auctions.  After  giving  assumptions  and  two  lemmas,  we  give  a 
bidding  club  protocol  for  first-price  auctions  with  participation  revelation.  Our 
main  technical  results  are  that: 

•  It  is  an  equilibrium  for  agents  to  accept  invitations  to  join  bidding  clubs 
when  invited  and  to  disclose  their  true  valuations  to  their  bidding  club’s 
coordinator,  and  for  singleton  agents  to  bid  as  they  would  in  an  auc¬ 
tion  with  a  stochastic  number  of  participants  in  an  economic  environment 
without  bidding  clubs,  in  which  the  distribution  over  the  number  of  par¬ 
ticipants  is  the  same  as  in  the  bidding  clubs  setting. 

•  In  equilibrium  each  agent  is  better  off  as  a  result  of  his  own  club  (that 
is,  his  expected  payoff  is  higher  than  would  have  been  the  case  if  his  club 
never  existed,  but  other  clubs — if  any — still  did  exist). 

3  For  technical  reasons  we  will  have  to  assume  that  there  is  a  finite  maximum  number  of 
bidders  in  each  bidding  club;  however,  this  maximum  may  be  any  integer  greater  than  or 
equal  to  two. 
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•  In  equilibrium  each  club  increases  all  non-members’  expected  payoffs,  as 
compared  to  equilibrium  in  the  case  where  all  club  members  participated  in 
the  auction  as  singleton  bidders,  but  all  other  clubs — if  any — still  existed. 

•  In  equilibrium  each  agent  is  either  better  off  or  equally  well  off  belonging 
to  a  bidding  club  as  compared  to  equilibrium  in  the  case  where  no  clubs 
exist. 

Finally,  section  5  touches  on  questions  of  trustworthiness  of  coordinators, 
legality  of  bidding  clubs  and  steps  an  auctioneer  could  take  to  disrupt  the  op¬ 
eration  of  bidding  clubs. 

3  Bidding  Club  Auction  Model 

In  this  section  we  extend  both  the  economic  environment  and  auction  mech¬ 
anism  described  above  to  include  the  characteristics  necessary  for  a  model  of 
bidding  clubs.  As  described  above,  our  aim  is  not  to  model  a  situation  where 
agents’  decision  to  collude  is  exogenous,  as  this  would  gloss  over  the  question  of 
whether  the  collusion  is  stable.  We  thus  include  the  collusive  protocol  as  part 
of  the  model  and  show  that  it  is  individually  rational  ex  post  (i.e.,  after  agents 
have  observed  their  valuations)  for  agents  to  choose  to  collude.  However,  we  do 
consider  exogenous  the  selection  of  the  set  of  agents  who  are  offered  the  oppor¬ 
tunity  to  collude.  Furthermore,  we  want  to  show  the  impact  of  the  possibility 
of  collusion  upon  non-colluding  agents;  indeed,  even  colluding  agents  must  take 
into  account  the  possibility  that  other  groups  of  agents  in  the  auction  may  also 
be  colluding.  Once  we  have  defined  the  new  economic  environment  and  auction 
mechanism,  a  well-defined  Bayesian  game  will  be  specified  by  every  tuple  of  pri¬ 
mary  auction  type,  bidding  club  rules  and  distributions  of  agent  types,  number 
of  agents  and  number  of  bidding  clubs. 

3.1  The  Economic  Environment 

We  extend  the  economic  environment  Es  to  consist  of  a  set  of  agents  who  have 
non-negative  valuations  for  a  good  at  auction,  the  distinguished  agent  0  and  a  set 
of  bidding  club  coordinators  who  do  not  value  the  good,  but  may  invite  agents 
to  participate  in  a  bidding  club.  We  will  denote  the  new  economic  environment 
Ebc ■  Intuitively,  in  Ebc  an  agent’s  belief  update  after  observing  the  number  of 
agents  in  his  bidding  club  does  not  result  in  any  change  in  the  distribution  over 
the  number  of  other  agents  in  the  auction,  because  the  number  of  agents  in  each 
bidding  club  is  independent  of  the  number  of  agents  in  every  other  bidding  club. 

3.1.1  Coordinators 

Coordinators  are  not  free  to  choose  their  own  strategies;  rather,  they  act  as 
part  of  the  mechanism  for  a  subset  of  the  agents  in  the  economic  environment. 
We  select  coordinators  in  a  process  analogous  to  the  approach  for  exogenously 
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selecting  agents  in  [10] :  we  draw  a  finite  set  of  individuals  from  an  infinite  set  of 
potential  coordinators.  In  this  case,  however,  this  finite  set  is  considered  “poten¬ 
tial  coordinators”;  in  section  3.1.2  we  will  describe  which  potential  coordinators 
are  “actualized”,  i.e.,  correspond  to  actual  coordinators. 

Let  C  =  N  (excluding  0)  be  the  set  of  all  coordinators.  (3c  represents  the 
probability  that  a  finite  set  C  C  C  is  selected  to  be  the  set  of  potential  coordina¬ 
tors.  We  add  the  restriction  that  all  coordinators  are  equally  likely  to  be  chosen. 
A  consequence  of  this  restriction  is  that  an  agent’s  knowledge  of  the  coordinator 
with  whom  he  is  associated  does  not  give  him  additional  information  about  what 
other  coordinators  may  have  been  selected.  We  denote  the  probability  that  an 
auction  will  involve  nc  potential  coordinators  as  ”fc{n-c)  =  ]Cc  \c\—n  Pc-  We 
assume  that  7c(0)  =  7c(l)  =  0:  at  least  two  potential  coordinators  will  be 
associated  with  each  auction. 

3.1.2  Agents 

We  independently  associate  a  random  number  of  agents  with  each  potential 
coordinator,  again  drawing  a  finite  set  of  actual  agents  from  an  infinite  set 
of  potential  agents.  If  only  one  (actual)  agent  is  associated  with  a  potential 
coordinator,  the  potential  coordinator  will  not  be  actualized  and  hence  the  agent 
will  not  belong  to  a  bidding  club.  In  this  way  we  model  agents  who  participate 
directly  in  the  auction  without  being  associated  with  a  coordinator.  If  more 
than  one  agent  is  associated  with  a  potential  coordinator,  the  coordinator  is 
actualized  and  all  its  associated  agents  receive  an  invitation  to  participate  in 
the  bidding  club. 

Let  A  =  N  be  the  set  of  all  agents,  and  let  k  G  N  \  {0, 1}  be  the  maximum 
number  of  agents  who  may  be  associated  with  a  single  bidding  club.  Partition 
A  into  subsets,  where  agent  i  belongs  to  the  subset  A\i/Ki[.  Let  (3a  be  the 
probability  that  a  finite  set  A  C  At  is  the  set  of  agents  associated  with  potential 
coordinator  z;  we  assume  that  all  agents  are  equally  likely  to  be  chosen.  The 
probability  that  n  agents  will  be  associated  with  a  potential  coordinator  is 
denoted  7.4(71)  =  \A\=n@A •  the  definition  of  k,  Vj  >  k,  JaU)  =  0;  we 

assume  that  7.4(0)  =  0  and  that  7.4(1)  <  1. 

3.1.3  Types  and  Signals 

Recall  that  the  type  r,  £  T  of  agent  i  is  the  pair  (77,  s^)  £  V  x  S.  Let  S  £ 
N\  {0};  Si  denotes  agent  V s  private  information  about  the  number  of  agents  in 
his  bidding  club.4  Of  course,  if  this  number  is  1  then  there  is  no  coordinator 
for  the  agent  to  deal  with,  and  he  will  simply  participate  in  the  main  auction. 
Note  also  that  agents  are  neither  aware  of  the  number  of  potential  coordinators 

4In  fact,  none  of  our  results  require  that  agents  know  the  number  of  agents  in  their  bidding 
clubs;  it  would  be  sufficient  that  agents  know  whether  they  belong  to  a  bidding  club.  We 
consider  the  setting  where  agents’  signals  are  more  informative  because  deviation  from  the 
bidding  club  protocol  is  more  profitable  in  this  case. 


10 


33 


for  their  auction  nor  the  number  of  actualized  potential  coordinators,  though 
they  are  aware  of  both  distributions. 

3.1.4  Beliefs 

Once  an  agent  is  selected,  he  updates  his  probability  distribution  over  the  num¬ 
ber  of  actual  agents  in  the  economic  environment.  Not  all  agents  will  have  the 
same  beliefs — agents  who  have  been  signaled  that  they  belong  to  a  bidding  club 
will  expect  a  larger  number  of  agents  than  singleton  agents.  We  denote  by  p 
the  probability  that  there  are  a  total  of  m  agents  in  the  auction,  given  that 
there  are  n  bidding  clubs  and  that  there  are  k  agents  in  the  bidder’s  own  club; 
we  denote  the  whole  distribution  Because  the  numbers  of  agents  in  each 

bidding  club  are  independent,  observe  that  every  agent  in  the  whole  auction  has 
the  same  beliefs  about  the  number  of  other  agents  in  the  economic  environment, 
discounting  those  agents  in  his  own  bidding  club.  Hence  agent  i’s  beliefs  are 
described  by  the  distribution  Pn,Si. 

3.2  The  Augmented  Auction  Mechanism 

Bidding  clubs,  in  combination  with  a  main  auction,  induce  an  augmented  auc¬ 
tion  mechanism  for  their  members: 

1.  A  set  A  of  bidders  is  invited  to  join  the  bidding  club. 

2.  Each  agent  i  sends  a  message  /q  to  the  bidding  club  coordinator.  This 
may  be  the  null  message,  which  indicates  that  i  will  not  participate  in 
the  coordination  and  will  instead  participate  freely  in  the  main  auction. 
Otherwise,  i  agrees  to  be  bound  by  the  bidding  club  rules,  and  /q  is  i’s 
declared  valuation  for  the  good.  Of  course,  i  can  lie  about  his  valuation. 

3.  Based  on  commonly-known  rules  and  the  information  all  the  members 
supply,  the  coordinator  selects  a  subset  of  the  agents  to  bid  in  the  main 
auction.  We  assume  that  the  coordinator  can  force  agents  to  bid  as  desired, 
e.g.  by  imposing  a  punitive  charge  on  misbehaving  agents. 

4.  The  coordinator  makes  a  payment  to  each  club  member.  The  amount  of 
the  payment  must  not  depend  on  any  of  the  agents’  declared  valuations 
or  on  the  outcome  of  the  main  auction. 

5.  If  a  bidder  represented  by  the  coordinator  wins  the  main  auction,  he  is 
made  to  pay  the  amount  required  by  the  auction  mechanism  to  the  auc¬ 
tioneer.  In  addition,  he  may  be  required  to  make  an  additional  payment 
to  the  coordinator. 

Any  number  of  coordinators  may  participate  in  an  auction.  However,  we 
assume  that  there  is  only  a  single  coordination  protocol,  and  that  this  protocol 
is  common  knowledge. 
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4  Bidding  Clubs  for  First-Price  Auctions 

This  section  contains  the  paper’s  main  technical  results.  We  begin  by  stating 
some  (mild)  assumptions  about  the  distribution  of  agent  valuations,  then  use 
these  assumptions  to  prove  a  technical  lemma.  A  second  lemma  explains  how 
we  can  show  the  existence  of  an  equilibrium  in  a  setting  where  agents  receive 
asymmetric  information  and  are  subject  to  asymmetric  payment  rules.  We 
then  give  the  bidding  club  protocol  for  first-price  auctions,  based  on  a  first- 
price  auction  with  participation  revelation  as  described  in  section  2.4.  We  show 
an  equilibrium  of  this  auction,  and  demonstrate  that  agents  gain  under  this 
equilibrium. 

4.1  Assumptions  about  F 

Our  results  hold  for  a  broad  class  of  distributions  of  agent  valuations — all  dis¬ 
tributions  for  which  the  following  two  assumptions  are  true. 

Assumption  1  F  is  continuous  and  atomless. 

In  order  to  give  our  second  assumption,  we  must  introduce  some  notation: 

OO 

lie  >  i  =  ^  )  Px  ■  (4) 

x—i 

We  now  define  the  relation  “<”  for  probability  distributions: 

P  <  P'  iff  3l(\/i  <  l,  Px>i  =  pxyt  and  Vi  >  l,  Px>i  <  Px>i).  (5) 
We  are  now  able  to  state  our  second  assumption: 

Assumption  2  (P  <  P')  implies  thatVv,  be(v,P)  <  be(v,P') 

Intuitively,  we  assume  that  every  agent’s  symmetric  equilibrium  bid  in  Es 
with  number  of  participants  drawn  from  P'  is  strictly  greater  than  that  agent’s 
symmetric  equilibrium  bid  in  Es  with  number  of  participants  drawn  from  P,  in 
the  case  where  P'  stochastically  dominates  P.5 

4.2  A  Technical  Lemma 

It  is  important  to  note  that  the  notation  Pn,k  may  be  seen  as  defining  a  proba¬ 
bility  distribution  over  the  number  of  agents  in  economic  environment  Es  (i.e., 
even  without  the  existence  of  bidding  clubs).  It  is  thus  possible  to  discuss  equi¬ 
librium  bids  in  the  classical  stochastic  settings  where  the  number  of  bidders  is 
drawn  from  such  a  distribution.  While  it  will  remain  to  show  why  these  val¬ 
ues  are  meaningful  in  our  setting  where  (among  other  differences)  agents  have 

5This  assumption  holds  for  every  standard  distribution  of  independent  valuations  of  which 
we  are  aware. 
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asymmetric  information,  it  will  be  useful  to  prove  the  following  lemma  about 
the  classical  stochastic  setting:6 

Lemma  1  VA:  >  2 ,  Vn  >  2,Vu,  be(v,  pn+fc-M)  >  be(v,Pn,k) 

Remark.  This  lemma  asserts  that  the  symmetric  equilibrium  bid  is  always 
higher  when  more  agents  belong  to  the  main  auction  as  singleton  bidders  and 
the  total  number  of  agents  is  held  constant. 

Proof.  Recall  Assumption  2  from  section  4.1.  We  defined  P  <  P'  as  the 
proposition  that  3Z(Vi  <  l,Px>i  =  Px>i  and  Vi  >  l,Px>i  <  Px>i) >  and  assumed 
that  (P  <  P')  implies  that  Vu,  be(v,  P)  <  be(v,P ').  It  is  thus  sufficient  to  show 
that  pn+k-1’1  >  pn,k  wj]j  take  /  =  n  q. 

First  we  will  show  that  Vj  <  n  +  k,  =  P  "> •  The  distribution 

pn+fc-i,i  expresses  the  belief  that  there  are  n+fc  —  2  potential  coordinators,  the 
membership  of  which  is  distributed  as  described  in  section  3.1,  and  one  potential 
coordinator  that  is  known  to  contain  only  a  single  bidder.  The  distribution  P”,fc 
expresses  the  belief  that  there  are  n  —  1  potential  coordinators,  the  membership 
of  which  is  again  distributed  as  described  in  section  3.1,  and  one  potential 
coordinator  that  is  known  to  contain  exactly  k  bidders.  Under  both  distributions 
it  is  certain  that  there  are  at  least  n  +  k  —  1  agents.  Therefore  Vj  <  n  + 

,  pn+k- 1,1  _  pn,k  _  1 

x>j  ~x>j~ 

Second,  Vj  >  n  +  k,  P"^-1,1  >  P">*.  Considering  observe  that 

for  n  +  k  —  2  of  the  potential  coordinators  the  probability  that  this  coordinator 
contains  a  single  agent  is  less  than  one  and  these  probabilities  are  all  indepen¬ 
dent;  the  last  potential  coordinator  contains  a  single  agent  with  probability  one. 
Considering  P" ,k,  there  are  n  —  1  potential  coordinators  where  the  probability 
of  containing  a  single  agent  is  less  than  one,  exactly  as  above,  and  k  potential 
coordinators  certain  to  contain  exactly  one  agent.  Thus  the  two  distributions 
agree  exactly  about  n  —  1  of  the  potential  coordinators,  which  both  hold  to 
contain  more  than  a  single  agent,  and  likewise  both  distributions  agree  that  one 
of  the  potential  coordinators  contains  exactly  one  agent.  However,  there  remain 
k  —  1  potential  coordinators  about  which  the  distributions  disagree;  Pn+,;_1’1 
always  generates  a  greater  or  equal  number  of  agents  for  these  potential  coordi¬ 
nators,  as  compared  to  P">fc.  Under  the  latter  distribution  all  these  agents  are 
singletons  with  probability  one,  while  under  the  former  there  is  positive  proba¬ 
bility  that  each  of  the  potential  coordinators  contains  more  than  one  agent.  As 
long  as  k  >  2,  there  is  at  least  one  potential  coordinator  for  which  Pn+fc_1,1 
stochastically  dominates  Pn-k .  Thus  Vfc  >2 ,Vn  >  2,Vr;P”+fc_1,1  >  P">fe.  ■ 


6  For  convenience  and  to  preserve  intuition  in  what  follows  we  will  refer  to  the  number  of 
potential  coordinators  and  the  number  of  agents  belonging  to  a  coordinator  even  though  we 
concern  ourselves  with  the  economic  environment  Ea  where  bidding  clubs  do  not  exist.  The 
number  of  potential  coordinators  is  shorthand  for  the  number  nc  drawn  from  7c  in  the  first 
phase  of  the  procedural  definition  of  the  distribution  P"'h  Likewise  the  number  of  agents 
associated  with  a  potential  coordinator  is  shorthand  for  the  number  of  agents  chosen  from 
one  of  the  nc  iterative  draws  from  7 a- 
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4.3  Truthful  Equilibria  in  Asymmetric  Mechanisms 

In  E[,c  there  is  informational  asymmetry  because  agents  receive  different  signals, 
and  asymmetric  payment  rules  because  some  agents  belong  to  bidding  clubs  of 
different  sizes  and  others  do  not  belong  to  a  bidding  club  at  all.  The  lemma  in 
this  section  will  allow  us  to  go  on  to  show  an  equilibrium  in  Theorem  1  despite 
these  asymmetries. 

We  describe  a  particular  class  of  auction  mechanisms  that  are  asymmetric 
in  the  sense  that  every  agent  is  subject  to  the  same  allocation  rule  but  to  a 
potentially  different  payment  rule,  and  furthermore  that  agents  may  receive 
different  signals.  A  truth-revealing  equilibrium  exists  in  such  auctions  when  the 
following  conditions  hold: 

1.  The  auction  allocates  the  good  to  the  agent  who  submits  the  highest  bid. 

2.  Consider  the  auction  Mj  in  which  all  agents  are  subject  to  agent  V s  pay¬ 
ment  rule  and  the  above  allocation  rule,  and  where  (hypothetically)  all 
agents  receive  the  signal  Sj .  Truth-revelation  is  a  symmetric  equilibrium 
in  Mj. 


Observe  that  the  second  condition  above  is  less  restrictive  than  it  may  ap¬ 
pear.  From  the  revelation  principle  we  can  see  that  for  every  auction  with  a 
symmetric  equilibrium  there  is  a  corresponding  auction  in  which  truth-revealing 
is  an  equilibrium  that  gives  rise  to  the  same  allocation  and  the  same  payments 
for  all  agents.  Mj  can  thus  be  seen  as  a  revelation  mechanism  for  any  other 
auction  that  has  a  symmetric  equilibrium. 

Definition  1  M  is  a  regular  asymmetric  auction  if  it  has  the  following  struc¬ 
ture,  where  M  represents  a  set  of  auctions  {Mi, . . . ,  Mn}  which  each  allocate  the 
good  to  the  agent  who  submits  the  highest  bid,  and  which  are  all  truth-revealing 
direct  mechanisms  for  n  risk-neutral  agents  with  independent  private  valuations 
drawn  from  the  same  distribution: 

1.  Each  agent  i  sends  a  message  pi  to  the  center. 

2.  The  center  allocates  the  good  to  the  agent  i  with  pi  £  maxj  pj .  If  multiple 
agents  submit  the  highest  message,  the  tie  is  broken  in  some  arbitrary  way. 

3.  Agent  i  is  made  to  transfer  U(p,  n)  to  the  center.  The  transfer  function  ti 
is  taken  from  M,;  £  M. 

Lemma  2  Truth-revelation  is  an  equilibrium  of  regidar  asymmetric  auctions. 

Proof.  The  payoff  of  agent  i  is  uniquely  determined  by  the  allocation  rule,  the 
transfer  function  ti,  and  all  agents’  strategies.  Assume  that  the  other  agents 
are  truth  revealing,  then  the  other  agents’  behavior,  the  allocation  rule,  and 

'  That  is,  for  every  agent  j  in  the  real  auction,  we  create  an  agent  k  in  the  hypothetical 
auction  M,  having  type  tj,  =  (v, ,  .s, ) . 
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agent  i’s  payment  rule  are  all  identical  in  M  and  Mj.  Since  truth-revelation  is 
an  equilibrium  in  Mj,  truth-revelation  is  agent  i’s  best  response  in  M.  I 

The  next  corollary,  following  directly  from  Lemma  2,  compares  a  single 
agent’s  expected  utility  under  two  different  auctions  which  implement  differ¬ 
ent  payment  rules.  We  will  need  this  result  for  our  proof  of  Theorem  1. 

Corollary  1  Consider  two  regular  asymmetric  auctions  M  and  M' ,  which  both 
implement  the  same  transfer  function  for  agent  i.  In  equilibrium,  agent  i’s 
expected  utility  is  the  same  in  both  M  and  M' . 

Proof.  The  payoff  of  agent  i  is  uniquely  determined  by  the  allocation  rule, 
its  transfer  function,  and  all  agents’  strategies.  Both  M  and  M'  have  the  same 
allocation  rule.  Lemma  2  tells  us  that  truth  revelation  is  a  best  response  for 
all  agents  in  both  M  and  M',  so  all  agents’  strategies  are  identical  in  the  two 
auctions.  In  general,  agents  may  not  receive  the  same  expected  utility  from  M 
and  M' .  However,  since  i  has  the  same  transfer  function  in  both  auctions,  i’s 
expected  utility  in  M  is  equal  to  his  expected  utility  in  M' .  ■ 


4.4  First-Price  Auction  Bidding  Club  Protocol 

What  follows  is  the  protocol  of  a  coordinator  who  approaches  k  agents. 

1.  Each  agent  i  sends  a  message  /i,  to  the  coordinator. 

2.  If  at  least  one  agent  declines  participation  then  the  coordinator  registers 
in  the  main  auction  for  every  agent  who  accepted  the  invitation  to  the  bid¬ 
ding  club.  For  each  bidder  i,  the  coordinator  submits  a  bid  of  6e(/q,  P",/c), 
where  n  is  the  number  of  bidders  announced  by  the  auctioneer. 

3.  If  all  k  agents  accepted  the  invitation  then  the  coordinator  drops  all 
bidders  except  the  bidder  with  the  highest  reported  valuation,  who  we 
will  denote  as  bidder  h.  For  this  bidder  the  coordinator  places  a  bid  of 
be(fj,h,  Pn'1)  in  the  main  auction. 

4.  The  coordinator  pays  each  member  a  pre-determined  payment  c  >  0  when¬ 
ever  all  bidders  participate  in  the  club,  and  regardless  of  the  outcome  of 
the  auction  and  of  how  much  each  bidder  bid.  Following  the  argument  in 
[3]  let  g  be  the  coordinator’s  ex  ante  expected  gain  if  all  agents  behave 
according  to  the  equilibrium  in  Theorem  1;  the  coordinator  will  not  lose 
money  on  expectation  if  it  pays  each  agent  c  =  ^(g  —  d)  with  0  <  d  <  g. 

5.  If  bidder  h  wins  in  the  main  auction,  he  is  made  to  pay  6e(/x/l,  P"’1)  to 
the  center  and  be(ph,  Pn’k)  —  be(ph,  Pni1)  to  the  coordinator. 

Observe  that  in  equilibrium  the  coordinator  has  an  expected  profit  of  d , 
though  it  will  lose  kc  whenever  the  winner  of  the  main  auction  does  not  belong 
to  its  club.  If  a  coordinator  wanted  to  be  budget-balanced  on  expectation  rather 
than  profitable  on  expectation,  it  could  set  d  =  0. 

We  are  now  ready  to  prove  the  main  theorem  of  the  paper: 
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Theorem  1  It  is  an  equilibrium  for  all  bidding  club  members  to  choose  to  par¬ 
ticipate  and  to  truthfully  declare  their  valuations  to  their  respective  bidding  club 
coordinators,  and  for  all  non-bidding  club  members  to  participate  in  the  main 
auction  with  a  bid  of  be(v,  P"’1). 

Proof.  We  first  prove  that  the  above  strategy  is  in  equilibrium  for  both 
categories  of  bidders  assuming  that  agents  all  participate;  we  then  prove  that 
participation  is  rational  for  all  agents. 

For  the  proof  of  equilibrium  we  consider  a  one-stage  mechanism  which  be¬ 
haves  as  follows: 

1.  The  center  announces  n,  the  number  of  bidders  in  the  main  auction. 

2.  Bidders  submit  bids  (messages)  to  the  mechanism. 

3.  The  bidder  with  the  highest  bid  is  allocated  the  good. 

4.  The  winning  bidder  is  made  to  pay  be(vi,  P”,Si )  —  c. 

5.  All  non- winning  bidding  club  members  are  paid  c. 

This  one-stage  mechanism  has  the  same  payment  rule  for  bidding  club  bid¬ 
ders  as  the  bidding  club  protocol  given  above,  but  no  longer  implements  a  first- 
price  payment  rule  for  singleton  bidders.  In  order  to  prove  that  the  strategies 
given  in  the  statement  of  the  theorem  are  an  equilibrium,  it  is  sufficient  to  show 
that  truthful  bidding  is  an  equilibrium  for  all  bidders  under  the  given  one-stage 
mechanism.  Observe  that  this  mechanism  may  be  seen  as  a  mechanism  M  in 
the  sense  of  Lemma  2:  it  allocates  the  good  to  the  agent  who  submits  the  high¬ 
est  message,  and  (by  definition  of  be )  the  auction  Mj  in  which  all  agents  are 
subject  to  agent  i’s  payment  rule  and  receive  the  signal  S;  has  truth  revelation 
as  a  symmetric  equilibrium. 

Strategy  of  non-club  bidder:  Assume  that  all  bidding  club  agents  (if  any) 
bid  truthfully.  Further  assume  that  all  non-club  agents  also  bid  truthfully  ex¬ 
cept  for  non-club  bidder  i.  The  probability  distribution  P”’1  correctly  describes 
the  distribution  of  the  number  of  agents  faced  by  i,  given  his  signal  Si  =  1  and 
the  auctioneer’s  announcement  that  there  are  n  bidders  in  the  main  auction. 
Although  agents  in  bidding  clubs  have  additional  information  about  the  number 
of  agents — each  agent  knows  that  there  is  at  least  one  other  agent  in  his  own 
club — their  prescribed  behavior  is  to  place  bids  of  be(/i,  P"’1)  jn  the  main  auc¬ 
tion.  Agent  i  thus  faces  an  unknown  number  of  agents  distributed  according  to 
P"’1  and  all  bidding  be(v,  P71,1).  The  auction  is  regular  asymmetric:  using  the 
result  from  Lemma  2,  z’s  strategic  decision  is  the  same  as  under  a  mechanism 
where  all  agents  are  subject  to  his  payment  rule  and  share  his  signal  s^,  and  with 
a  stochastic  number  of  bidders  distributed  according  to  Pn~  [ .  In  particular,  it 
does  not  matter  that  the  club  members  are  subject  to  different  payment  rules 
and  have  additional  information,  and  so  i  will  also  bid  be(v,  P”’1). 

Strategy  of  club  bidder:  Assume  that  all  agents  accept  the  invitation  to  join 
their  respective  clubs  and  then  truthfully  declare  their  valuations,  excluding 
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club  bidder  i  who  decides  to  participate  but  considers  his  bid.  Once  again, 
observe  that  the  auction  is  regular  asymmetric,  and  so  Lemma  2  applies:  Pn'k 
describes  the  distribution  over  the  number  of  agents  conditioned  on  z’s  signal 
Si  =  k,  and  the  bidder  submitting  the  highest  (global)  message  will  always  be 
allocated  the  good.  Therefore  truthful  bidding  is  a  best  response  for  agent  i, 
despite  the  information  asymmetry.  Because  i  gets  the  payment  c  regardless  of 
the  amount  of  his  bid,  the  presence  or  absence  of  this  payment  has  no  effect  on 
his  choice  of  what  amount  to  bid  given  the  decision  to  participate. 

We  now  turn  to  the  question  of  participation;  for  this  part  of  the  proof  we 
consider  the  original,  multi-stage  mechanism. 

Participation  of  non-club  bidder:  Because  there  is  no  participation  fee,  it 
is  always  rational  for  a  bidder  to  participate  in  a  first-price  auction. 

Participation  of  club  bidder:  Assume  that  c  =  0;  clearly  c  >  0  only  in¬ 
creases  agents’  incentive  to  participate  in  a  bidding  club.  Because  there  is  no 
participation  fee,  all  bidding  club  bidders  will  participate  in  the  auction,  but 
must  decide  whether  or  not  to  accept  their  coordinators’  invitations.  Assume 
that  all  agents  except  for  i  join  their  respective  clubs  and  bid  truthfully,  and 
agent  i  must  decide  whether  or  not  to  join  his  bidding  club.  Agent  i  knows 
the  number  of  agents  in  his  bidding  club  and  updates  his  distribution  over  the 
number  of  agents  in  the  whole  auction  as  Pn,k . 

Consider  the  classical  stochastic  case  where  all  bidders  have  the  same  infor¬ 
mation  as  i  (and  are  subject  to  the  same  payment  rules):  from  proposition  3  it 
is  a  best  response  for  i  to  bid  be(vi,  Pn'k).  In  this  setting  z’s  expected  gain  is 
the  same  as  in  the  equilibrium  of  the  one-stage  mechanism  from  the  first  part 
of  the  proof  where  all  bidding  club  members  (including  z)  join  their  clubs  and 
bid  truthfully  (with  c  =  0),  by  Corollary  1. 

As  a  result  of  i  declining  the  offer  to  participate  in  the  bidding  club  there 
are  n  —  1  bidders  in  the  main  auction  placing  bids  of  be(v,  Pn+k~1,1)  and 
k  —  1  other  bidders  placing  bids  of  be(v,  Pn,k).  We  know  from  Lemma  1  that 
be[v,  pn+fc-M)  >  be(v,  Pn,k).  Thus  the  singleton  bidders  and  other  bidding 
clubs  will  bid  a  higher  function8  of  their  valuations  than  the  bidders  from  the 
disbanded  bidding  club.  It  always  reduces  a  bidder’s  expected  gain  in  a  first- 
price  auction  to  cause  other  bidders  to  bid  above  the  equilibrium,  because  it 
reduces  the  chance  that  he  will  win  without  affecting  his  payment  if  he  does 
win.  This  is  exactly  the  effect  of  z  declining  the  offer  to  join  his  bidding  club: 
the  k  —  1  other  bidders  from  i’s  bidding  club  bid  according  to  the  equilibrium 
of  the  classical  stochastic  case  discussed  above,  but  the  n  —  1  singleton  and  bid¬ 
ding  club  bidders  submit  bids  that  exceed  the  symmetric  equilibrium  amount. 
Therefore  i’s  expected  gain  is  smaller  if  he  declines  the  offer  to  participate  than 
if  he  accepts  it.  ■ 


8Note  that  this  occurs  because  the  singleton  bidders  and  other  bidding  clubs  in  the  main 
auction  follow  a  strategy  that  depends  on  the  number  of  bidders  announced  by  the  auctioneer; 
hence  they  bid  as  though  all  the  k  —  1  bidders  from  the  disbanded  bidding  club  might  each 
be  independent  bidding  clubs. 
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4.5  Do  bidding  clubs  cause  agents  to  gain? 

All  things  being  equal,  bidders  are  better  off  being  invited  to  a  bidding  club 
than  being  sent  to  the  auction  as  singleton  bidders.  Intuitively,  an  agent  gains 
by  not  having  to  consider  the  possibility  that  other  bidders  who  would  otherwise 
have  belonged  to  his  bidding  club  might  themselves  be  bidding  clubs. 

Theorem  2  An  agent  i  has  higher  expected  utility  in  a  bidding  club  of  size  k 
bidding  as  described  in  Theorem  1  than  he  does  if  the  bidding  club  does  not  exist 
and  k  additional  agents  ( including  i )  participate  directly  in  the  main  auction  as 
singleton  bidders,  again  bidding  as  described  in  Theorem  1,  for  c  >  0. 

Proof.  Consider  the  counterfactual  case  where  agent  V s  bidding  club  does  not 
exist,  and  all  the  members  of  this  bidding  club  are  replaced  by  singleton  bidders 
in  the  main  auction.  We  will  show  that  i  is  better  off  as  a  member  of  the  bidding 
club  (even  when  c  =  0)  than  in  this  case.  If  there  were  n  potential  coordinators 
in  the  original  auction  and  k  agents  in  z’s  bidding  club,  then  the  auctioneer  would 
announce  n  +  k  —  1  as  the  number  of  participants  in  the  new  auction.  Under  the 
equilibrium  from  Theorem  1,  as  a  singleton  bidder  i  will  bid  be(i>i,  pn+k~1Iy  if 
he  belonged  to  the  bidding  club  and  followed  the  same  equilibrium  i  would  bid 
be(vi,  Pn,k).  In  both  cases  the  auction  is  economically  efficient,  which  means  i 
is  better  off  in  the  auction  that  requires  him  to  pay  a  smaller  amount  when  he 
wins.  Lemma  1  shows  that  Vfc  >  2,V?z  >  2 yv,be[v,Pn+k~1,1)  >  be(v,  Pn,k), 
and  so  our  result  follows.  ■ 

We  can  also  show  that  singleton  bidders  and  members  of  other  bidding  clubs 
benefit  from  the  existence  of  each  bidding  club  in  the  same  sense.  Following  an 
argument  similar  to  the  one  in  Theorem  2,  other  bidders  gain  from  not  having  to 
consider  the  possibility  that  additional  bidders  might  represent  bidding  clubs. 
Paradoxically,  as  long  as  c'  >  0,  other  bidders’  gain  from  the  existence  of  a  given 
bidding  club  is  greater  than  the  gain  of  that  club’s  members. 

Corollary  2  In  the  equilibrium  described  in  Theorem  1,  singleton  bidders  and 
members  of  other  bidding  clubs  have  higher  expected  utility  when  other  agents 
participate  in  a  given  bidding  club  of  size  k>  2,  as  compared  to  a  case  where  k 
additional  agents  participate  directly  in  the  main  auction  as  singleton  bidders. 

Proof.  Consider  a  singleton  bidder  in  the  first  case,  where  the  club  of  k  agents 
does  exist.  (It  is  sufficient  to  consider  a  singleton  bidder,  since  other  bidding 
clubs  bid  in  the  same  way  as  singleton  bidders.)  Following  the  equilibrium  from 
Theorem  1  this  agent  would  submit  the  bid  &e(i>,,  P"’1).  Theorem  2  shows  that 
it  is  better  to  belong  to  a  bidding  club  (and  thus  to  bid  be(vi,  P”,fc))  than  to  be 
a  singleton  bidder  in  an  auction  with  the  same  number  of  agents  (and  thus  to 
bid  &e(ui,P"+fe_1’1).  Since  the  distribution  Pn’k  is  just  P"’1  with  k—  1  singleton 
agents  added,  Vfc  >  2,  be(vi,  P™’1)  <  be(vi,  Pn,k).  Thus  Vfc  >  2 ,  &e(ty,  P”’1)  < 
b^v^P^-1’1).  ■ 

Finally,  we  can  show  that  agents  prefer  participating  in  Ebc  in  the  equilib¬ 
rium  from  Theorem  1  in  a  bidding  club  of  size  k  (thus,  where  the  number  of 

18 


41 


agents  is  distributed  according  to  Pn,k )  to  participating  in  Es  with  number  of 
bidders  distributed  according  to  as  long  as  c  >  0. 

Theorem  3  For  all  G  T,  for  all  k  >  2,  for  all  n  >  2,  for  all  c  >  0,  agent  i 
obtains  smaller  expected  utility  by: 

1.  participating  in  a  first-price  auction  with  participation  revelation  in  Es 
with  number  of  bidders  distributed  according  to  Pn,fc;  than  by 

2.  participating  in  a  bidding  club  of  size  k  in  Etc  and  following  the  equilibrium 
from  Theorem  1. 

When  c  =  0,  agent  i  obtains  the  same  expected  utility  in  both  cases. 

Proof.  For  any  efficient  auction,  an  agent  i’s  expected  utility  EUi  is 
J2j  PjF:’~1(Vi)b,  where  Pj  is  the  probability  that  there  are  a  total  of  j  agents 
in  the  economic  environment,  PJ_1(wi)  is  the  probability  that  i  has  the  high 
valuation  among  these  j  agents,  and  b  is  the  amount  of  i’s  bid. 

First,  we  consider  case  (1).  From  proposition  4  it  is  an  equilibrium  for 
agent  i  in  economic  environment  Es  to  bid  be(vi,j)  in  a  first-price  auction  with 
participation  revelation,  where  j  is  the  number  of  bidders  announced  by  the 
auctioneer.  Since  the  number  of  agents  is  distributed  according  to  Pn,fc,  agent 
i’s  expected  utility  in  a  first-price  auction  with  participation  revelation  is: 


EUi'Pr  ='52p?kFi-1(vi)be(viJ) 


(6) 


J2ePe’kPe  1(w») 

: - 7 -  /  Pd  ^  (Vi)b  [Vi,  l) 


PTkFj-Hvi ) 


=y^ p™,kFe  i(vi)  ( , 
r  [r^p^F^vi) 

=  Y,P?kFe~1(vi)be(vi,pn'k )  =  FUi,s 


be(vi,j) 


(7) 


Equation  (7)  is  agent  i’s  expected  utility  in  a  first-price  auction  with  a 
stochastic  number  of  participants,  which  we  shall  denote  EUijS.  Observe  that 
we  make  use  of  the  definition  of  be(vi ,  P)  from  equation  (3). 

We  now  consider  case  (2).  Let  EUi^c  denote  agent  i’s  expected  utility  in 
E),c  as  a  member  of  a  bidding  club  of  size  k,  in  the  equilibrium  from  Theorem 
1.  Recall  that  in  this  equilibrium  the  bidder  with  the  globally  highest  valuation 
always  wins,  and  that  all  agents  in  bidding  clubs  of  size  k  bid  be(vi,  Pn,k)  and 
receive  a  positive  payment  of  c,  which  does  not  depend  on  the  amount  of  their 
bids  or  on  whether  any  agent  in  the  club  wins  the  auction. 

EUiM  =  YJP7kFj-1{vi)be{vl)Pn-k)  +  c  (8) 
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Intersecting  equations  (7)  and  (8),  we  get: 

EUi,bc  -  EUi  pr  =  c  (9) 

When  c  >  0,  agent  i’s  expected  utility  is  strictly  greater  in  case  (2)  than  in 
case  (1);  when  c  =  0  he  has  the  same  expected  utility  in  both  cases.  ■ 

What  about  agents  who  do  not  belong  to  bidding  clubs?  We  can  show 
in  the  same  way  that  they  are  not  harmed  by  the  existence  of  bidding  clubs: 
they  are  neither  better  nor  worse  off  in  the  bidding  club  economic  environment 
than  facing  the  same  distribution  of  opponents  in  a  first-price  auction  with 
participation  revelation. 

Corollary  3  For  all  Ti  £  T,  for  all  n  >  2,  agent  i  obtains  the  same  expected 
utility  by: 

1.  participating  in  a  first-price  auction  with  participation  revelation  in  Es 
with  number  of  bidders  distributed  according  to  P”’1;  as  by 

2.  participating  as  a  singleton  bidder  in  Ebc  and  following  the  equilibrium 
from  Theorem  1. 

Proof.  We  follow  the  same  argument  as  in  Theorem  3,  except  that  k  =  1 
and  EUi}bc  does  not  include  c.  Thus  we  get  EUjjjc  =  EUi}Pr.  ■ 


5  Discussion 

In  this  section  we  consider  the  trustworthiness  and  legality  of  coordinators,  and 
also  discuss  two  ways  for  auctioneers  to  disrupt  bidding  clubs  in  their  auctions. 

5.1  Trust 

Why  would  a  bidding  club  coordinator  be  willing  to  provide  reliable  service,  and 
likewise  why  would  bidders  have  reason  to  trust  a  coordinator?  For  example,  a 
malicious  coordination  protocol  could  be  used  simply  to  drop  all  its  members 
from  the  auction  and  reduce  competition.  While  this  is  a  reasonable  concern,  our 
coordinators  make  a  profit  on  expectation,  thus  providing  incentive  for  a  trusted 
third  party  to  run  a  reliable  coordination  service.  Indeed,  coordinators  would  be 
very  inexpensive  to  run:  as  their  behavior  is  entirely  deterministic,  they  could 
operate  without  any  human  supervision.  The  establishment  of  trust  is  exogenous 
to  our  model;  we  have  simply  assumed  that  all  agents  trust  coordinators  and 
that  all  coordinators  are  honest. 
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5.2  Legality 

We  have  often  been  asked  about  the  legal  issues  surrounding  the  use  of  bidding 
clubs.  While  this  is  an  interesting  and  pertinent  question,  it  exceeds  both  our 
expertise  and  the  scope  of  this  paper.  We  should  note,  however,  that  uses  of 
bidding  clubs  exist  that  might  not  fall  under  the  legal  definition  of  collusion.  For 
example,  a  corporation  could  use  a  bidding  club  to  choose  one  of  its  departments 
to  bid  in  an  external  auction.  In  this  way  the  corporation  could  be  sure  to  avoid 
bidding  against  itself  in  the  external  auction  while  avoiding  dictatorship  and 
respecting  each  department’s  self-interest.  Coordinators  may  also  be  permitted 
by  the  auctioneer:  e.g.,  by  an  internet  market  seeking  to  attract  more  bidders 
to  its  site. 

5.3  Disrupting  Bidding  Clubs 

There  are  two  things  an  auctioneer  can  do  to  disrupt  bidding  clubs  in  a  first- 
price  auction.  First,  she  can  permit  “false-name  bidding.”  (Our  auction  model 
has  assumed  that  each  agent  may  place  only  a  single  bid  in  the  auction,  and  that 
the  center  has  a  way  of  uniquely  identifying  agents.)  Second,  she  can  refrain 
from  publicly  disclosing  the  winner  of  the  auction. 

If  bidders  can  bid  both  in  their  bidding  clubs  and  in  the  main  auction,  they 
are  better  off  deviating  from  the  equilibrium  in  Theorem  1  in  the  following  way. 
A  bidder  i  can  accept  the  invitation  to  join  the  bidding  club  but  place  a  very  low 
bid  with  the  coordinator;  at  the  same  time,  i  can  directly  submit  a  competitive 
bid  in  the  main  auction.  Agent  i  will  gain  by  following  this  strategy  when 
all  other  agents  follow  the  strategies  specified  in  Theorem  1  because  accepting 
the  invitation  to  join  the  bidding  club  ensures  that  the  club  does  drop  all  but 
one  of  its  members  and  also  causes  the  high  bidder  to  bid  less  than  he  would 
if  he  were  not  bound  to  the  coordination  protocol.  If  the  bidding  club  drops 
any  bidders  other  than  i  then  all  agents’  bids  will  also  be  lowered  because  the 
number  of  participants  announced  by  the  auctioneer  will  be  smaller,  compared 
to  the  case  where  the  bidding  club  did  not  exist  or  where  it  was  disbanded. 
However,  if  false-name  bidding  is  impossible  and  the  winner  of  the  auction  is 
publicly  disclosed  then  the  bidding  club  coordinator  can  detect  an  agent  who  has 
deviated  in  this  way.  Because  the  agent  has  agreed  to  participate  in  the  bidding 
club  the  coordinator  has  the  power  to  punish  this  agent  and  make  the  deviation 
unprofitable.  If  either  or  both  of  these  requirements  does  not  hold,  however, 
the  coordinator  will  be  unable  to  detect  defection  and  so  the  equilibrium  from 
Theorem  1  will  not  hold. 


6  Conclusion 

We  have  presented  a  formal  model  of  bidding  clubs  which  in  many  ways  extends 
models  traditionally  used  in  the  study  of  collusion;  most  importantly,  all  agents 
behave  strategically  based  on  correct  information  about  the  economic  environ¬ 
ment,  including  the  possibility  that  other  agents  will  collude.  Other  features 

21 


44 


of  our  setting  include  a  stochastic  number  of  agents  and  of  bidding  clubs  in 
each  auction,  and  revelation  by  the  auctioneer  of  the  number  of  bids  received. 
The  strategy  space  is  expanded  so  that  the  decision  of  whether  or  not  to  join  a 
bidding  club  is  part  of  an  agent’s  choice  of  strategy.  Bidding  clubs  make  money 
on  expectation,  and  can  optionally  be  configured  so  they  never  lose  money. 
We  have  showed  a  bidding  club  protocol  for  first-price  auctions  that  leads  to 
a  (globally)  efficient  allocation  in  equilibrium,  and  which  does  not  make  use  of 
side-payments  in  the  case  of  c  =  0.  There  are  three  ways  of  asking  the  question 
of  whether  agents  gain  by  participating  in  bidding  clubs  in  first-price  auctions: 

1.  Could  any  agent  gain  by  deviating  from  the  protocol? 

2.  Would  any  agent  be  better  off  if  his  bidding  club  did  not  exist? 

3.  Would  any  agent  would  be  better  off  in  an  economic  environment  that  did 
not  include  bidding  clubs  at  all? 

We  have  shown  that  agents  are  strictly  better  off  in  all  three  senses.  (In  the 
third  sense,  the  gain  is  only  strict  when  c  >  0.)  We  have  also  shown  that  each 
bidding  club  causes  non-members  to  gain  in  the  second  sense,  and  does  not  hurt 
them  in  the  third  sense.  Finally,  we  have  discussed  ways  for  an  auctioneer  to 
set  up  the  rules  of  her  auction  so  as  to  disrupt  the  operation  of  bidding  clubs. 
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Abstract 

For  the  problem  of  online  real-time  scheduling  of  jobs  on  a  single  processor,  previous  work 
presents  matching  upper  and  lower  bounds  on  the  competitive  ratio  that  can  be  achieved  by 
a  deterministic  algorithm.  However,  these  results  only  apply  to  the  non-strategic  setting  in 
which  the  jobs  are  released  directly  to  the  algorithm.  Motivated  by  emerging  areas  such  as  grid 
computing,  we  instead  consider  this  problem  in  an  economic  setting,  in  which  each  job  is  released 
to  a  separate,  self-interested  agent.  The  agent  can  then  delay  releasing  the  job  to  the  algorithm, 
inflate  its  length,  and  declare  an  arbitrary  value  and  deadline  for  the  job,  while  the  center 
determines  not  only  the  schedule,  but  the  payment  of  each  agent.  For  the  resulting  mechanism 
design  problem  (in  which  we  also  slightly  strengthen  an  assumption  from  the  non-strategic 
setting),  we  present  a  mechanism  that  addresses  each  incentive  issue,  while  only  increasing  the 
competitive  ratio  by  one.  We  then  show  a  matching  lower  bound  for  deterministic  mechanisms 
that  never  pay  the  agents. 


1  Introduction 

We  consider  the  problem  of  online  scheduling  of  jobs  on  a  single  processor.  Each  job  is  characterized 
by  a  release  time,  a  deadline,  a  processing  time,  and  a  value  for  successful  completion  by  its  deadline. 
The  objective  is  to  maximize  the  sum  of  the  values  of  the  jobs  completed  by  their  respective  deadlines. 
The  key  challenge  in  this  online  setting  is  that  the  schedule  must  be  constructed  in  real-time,  even 
though  nothing  is  known  about  a  job  until  its  release  time. 

Competitive  analysis  [5,  9],  with  its  roots  in  [11],  is  a  well-studied  approach  for  analyzing  online 
algorithms  by  comparing  them  against  the  optimal  offline  algorithm,  which  has  full  knowledge  of  the 
input  at  the  beginning  of  its  execution.  One  interpretation  of  this  approach  is  as  a  game  between  the 
designer  of  the  online  algorithm  and  an  adversary.  First,  the  designer  selects  the  online  algorithm. 
Then,  the  adversary  observes  the  algorithm  and  selects  the  sequence  of  jobs  that  maximizes  the 
competitive  ratio:  the  ratio  of  the  value  of  the  jobs  completed  by  an  optimal  offline  algorithm  to  the 
value  of  those  completed  by  the  online  algorithm. 

Two  papers  paint  a  complete  picture  in  terms  of  competitive  analysis  for  this  setting,  in  which 
the  algorithm  is  assumed  to  know  k,  the  maximum  ratio  between  the  value  densities  (value  divided 
by  processing  time)  of  any  two  jobs.  For  k  =  1,  [3]  presents  a  4-competitive  algorithm,  and  proves 
that  this  is  a  lower  bound  on  the  competitive  ratio  for  deterministic  algorithms.  The  same  paper 

*The  author  would  like  to  acknowledge  Yoav  Shoham  for  his  influential  comments  on  drafts  of  the  paper,  and  Yossi 
Azar  for  a  useful  discussion.  This  work  was  supported  in  part  by  DARPA  grant  F30602-00- 2-0598. 
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also  generalizes  the  lower  bound  to  (1  +  \fk)2  for  any  k  >  1,  and  [14]  then  presents  a  matching 
(1  +  -\//c)2-competitive  algorithm. 

The  setting  addressed  by  these  papers  is  completely  non-strategic,  and  the  algorithm  is  assumed 
to  always  know  the  true  characteristics  of  each  job  upon  its  release.  However,  in  domains  such  as 
grid  computing  (see,  for  example,  [6,  7])  this  assumption  is  invalid,  because  buyers  of  processor  time 
choose  when  and  how  to  submit  their  jobs.  Furthermore,  sellers  not  only  schedule  jobs  but  also 
determine  the  amount  that  they  charge  buyers,  an  issue  not  addressed  in  the  non-strategic  setting. 

Thus,  we  consider  an  extension  of  the  setting  in  which  each  job  is  owned  by  a  separate,  self- 
interested  agent.  Instead  of  being  released  to  the  algorithm,  each  job  is  now  released  only  to  its 
owning  agent.  Each  agent  now  has  four  different  ways  in  which  it  can  manipulate  the  algorithm: 
it  decides  when  to  submit  the  job  to  the  algorithm  after  the  true  release  time,  it  can  artificially 
inflate  the  length  of  the  job,  and  it  can  declare  an  arbitrary  value  and  deadline  for  the  job.  Because 
the  agents  are  self-interested,  they  will  choose  to  manipulate  the  algorithm  if  doing  so  will  cause 
their  job  to  be  completed;  and,  indeed,  one  can  find  examples  in  which  agents  have  incentive  to 
manipulate  the  algorithms  presented  in  [3]  and  [14] . 

The  addition  of  self-interested  agents  moves  the  problem  from  the  area  of  algorithm  design  to 
that  of  mechanism  design.  In  this  setting,  a  mechanism  will  take  as  input  a  job  from  each  agent, 
and  return  a  schedule  for  the  jobs  and  a  payment  to  be  made  by  each  agent  to  the  center.  The 
mechanism  design  goal  of  incentive  compatibility  requires  that  it  is  always  in  each  agent’s  best 
interests  to  immediately  submit  its  job  upon  release,  and  to  truthfully  declare  its  value,  length,  and 
deadline. 

In  order  to  evaluate  a  mechanism  using  competitive  analysis,  the  adversary  model  must  be 
updated.  In  the  new  model,  the  adversary  still  determines  the  sequence  of  jobs,  but  it  is  the  self- 
interested  agents  who  determine  the  observed  input  of  the  mechanism.  Thus,  in  order  to  achieve  a 
competitive  ratio  of  c,  an  online  mechanism  must  both  be  incentive  compatible,  and  always  achieve 
at  least  -c  of  the  value  that  the  optimal  offline  mechanism  achieves  on  the  same  sequence  of  jobs. 

The  rest  of  the  paper  is  structured  as  follows.  In  Section  2,  we  formally  define  and  review  results 
from  the  original,  non-strategic  setting.  After  introducing  the  incentive  issues  through  an  example, 
we  formalize  the  mechanism  design  setting  in  Section  3.  In  Section  4  we  present  our  first  main  result, 
a  ((1  +  Vk)2  +  l)-competitive  mechanism.  We  also  show  how  we  can  simplify  this  mechanism  for 
the  special  case  in  which  k  =  1  and  each  agent  cannot  alter  the  length  of  its  job.  Returning  to  the 
general  setting,  we  show  in  Section  5  that,  for  any  k  >  1,  this  competitive  ratio  is  a  lower  bound  for 
deterministic  mechanisms  that  do  not  pay  agents.  All  formal  proofs  are  delayed  to  the  appendix. 
Finally,  in  Section  6,  we  discuss  related  work  other  than  the  directly  relevant  [3]  and  [14],  before 
concluding  with  Section  7. 


2  Non-Strategic  Setting 

In  this  section,  we  formally  define  the  original,  non-strategic  setting,  and  recap  previous  results. 

2.1  Formulation 

There  exists  a  single  processor  on  which  jobs  can  execute,  and  a  set  N  =  {1, ...  ,  n}  of  jobs,  although 
this  number  is  not  known  beforehand.  Each  job  i  is  characterized  by  a  tuple  9i  =  (rj,  dt.  li,  Vi),  which 
denotes  the  release  time,  deadline,  length  of  processing  time  required,  and  value,  respectively.  The 
space  0;  of  possible  tuples  is  the  same  for  each  job  and  consists  of  all  9i  such  that  r,,  di,  U,Vi  £  3?+ 
(thus,  the  model  of  time  is  continuous).  Each  job  is  released  at  time  rj,  at  which  point  its  three 
other  characteristics  are  known.  Nothing  is  known  about  the  job  before  its  arrival.  Each  deadline  is 
firm  (or,  hard),  which  means  that  no  value  is  obtained  for  a  job  that  is  completed  after  its  deadline. 
Preemption  of  jobs  is  allowed,  and  it  takes  no  time  to  switch  between  jobs.  Thus,  job  i  is  completed 
if  and  only  if  the  total  time  it  executes  on  the  processor  before  di  is  at  least  k . 
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Define  the  value  density  Pi  =  j1  of  job  i  to  be  the  ratio  of  its  value  to  its  length.  For  an 
input  9  =  {6 1, . . .  ,9n),  denote  the  maximum  and  minimum  value  densities  as  pmin  =  min.;  pi  and 
Pmax  =  max,;  pt .  The  importance  ratio  is  then  defined  to  be  pmax ,  the  maximal  ratio  of  value 
densities  between  two  jobs.  The  algorithm  is  assumed  to  always  know  an  upper  bound  k  on  the 
importance  ratio.  For  simplicity,  we  normalize  the  range  of  possible  value  densities  so  that  pmin  =  1- 

An  online  algorithm  is  a  function  /  :  0j  x  ...  x  0„  — >  X  that  maps  the  vector  of  tuples  (for  any 
number  n)  to  an  alternative  x.  In  this  setting,  an  alternative  a;  £  X  is  simply  a  schedule  of  jobs  on 
the  processor,  recorded  by  the  function  S  :  5ft+  — >  {0, 1, . . . ,  n},  which  maps  each  point  in  time  to 
the  active  job,  or  to  0  if  the  processor  is  idle.  We  will  use  S(9,t)  as  shorthand  for  the  S(t)  of  /(#), 
and  it  denotes  the  active  job  at  time  t  when  the  input  is  6. 

To  denote  the  total  elapsed  time  that  a  ^ob  has  spent  on  the  processor  at  time  t  when  the  input 
is  0,  we  will  use  the  function  ei(9,t)  =  f0  p(S(9,  x)  =  i)dx,  where  /i(-)  is  an  indicator  function 
that  returns  1  if  the  argument  is  true,  and  zero  otherwise.  A  job’s  laxity  at  time  t  is  defined  to  be 
(di  —  t  —  li  +  ei(9,  f)),  the  amount  of  time  that  it  can  remain  inactive  and  still  be  completed  by  its 
deadline.  A  job  is  abandoned  if  it  cannot  be  completed  by  its  deadline  (formally,  if  d;— t+e;(0,  t)  <  U). 

Since  a  job  cannot  be  executed  before  its  release  time,  the  space  of  possible  schedules  is  restricted 
in  that  S(9.  t)  =  i  implies  r*  <  t.  Also,  because  the  online  algorithm  must  produce  the  schedule  over 
time,  without  knowledge  of  future  inputs,  it  must  make  the  same  decision  at  time  t  for  inputs  that 
are  indistinguishable  at  this  time.  Formally,  let  9(t)  denote  the  subset  of  the  tuples  in  9  that  satisfy 
Vi  <  t.  The  constraint  is  then  that  9(t)  =  9'(t)  implies  S(9,t)  =  S(9',t). 

The  objective  function  is  the  sum  of  the  values  of  the  jobs  that  are  completed  by  their  respective 
deadlines:  W(f(9),9)  =  '  p(ei(@idi)  >  /;)).  Let  W*{6)  =  ma xxexW(x,9)  denote  the 

maximum  possible  total  value  for  the  profile  9. 

In  competitive  analysis,  an  online  algorithm  is  evaluated  by  comparing  it  against  an  optimal 
offline  algorithm.  Because  the  offline  algorithm  knows  the  entire  input  9  at  time  0  (but  still  cannot 
start  each  job  i  until  time  r,),  it  always  achieves  W*(9).  An  online  algorithm  /(•)  is  (strictly)  c- 
competitive  if  there  does  not  exist  an  input  9  such  that  c-  W(f(6),9)  <  W*{9).  An  algorithm  that 
is  c-competitive  is  also  said  to  achieve  a  competitive  ratio  of  c. 

Finally,  it  is  assumed  that  there  does  not  exist  an  overload  period  of  infinite  duration.  A  period  of 
time  [ts ,  t/]  is  overloaded  if  the  sum  of  the  lengths  of  the  jobs  whose  release  time  and  deadline  both  fall 
within  the  time  period  exceeds  the  duration  of  the  interval  (formally,  if  v  —  ts  <  5Zj|(t»<ri  d  <tf)  ^)- 
Without  such  an  assumption,  it  is  not  possible  to  achieve  a  finite  competitive  ratio  [14] . 

2.2  Previous  Results 

In  the  non-strategic  setting,  [3]  presents  a  4-competitive  algorithm  called  TD\  (version  2)  for  the 
case  of  k  =  1,  while  [14]  presents  a  (1  +  v/fc)2-competitive  algorithm  called  Dover  for  the  general  case 
of  k  >  1.  Matching  lower  bounds  for  deterministic  algorithms  for  both  of  these  cases  were  shown  in 
[3].  In  this  section  we  provide  a  high-level  description  of  TD\  (version  2)  using  an  example. 

TD i  (version  2)  divides  the  schedule  into  intervals,  each  of  which  begins  when  the  processor 
transitions  from  idle  to  busy  (call  this  time  tb),  and  ends  with  the  completion  of  a  job.  The  first 
active  job  of  an  interval  may  have  laxity;  however,  for  the  remainder  of  the  interval,  preemption  of 
the  active  job  is  only  considered  when  some  other  job  has  zero  laxity.  For  example,  when  the  input 
is  the  set  of  jobs  listed  in  Table  1,  the  first  interval  is  the  complete  execution  of  job  1  over  the  range 
[0.0,  0.9].  No  preemption  is  considered  during  this  interval,  because  job  2  does  not  have  zero  laxity 
before  time  1.5.  Then,  a  new  interval  starts  at  tb  =  0.9  when  job  2  becomes  active.  Before  job  2  can 
complete,  preemption  is  considered  at  time  4.8,  when  job  3  is  released  with  zero  laxity. 

In  order  to  decide  whether  to  preempt  the  active  job,  TD\  (version  2)  uses  two  more  variables: 
te  and  pJoss.  The  former  records  the  latest  deadline  of  a  job  that  would  be  abandoned  if  the  active 
job  executes  to  completion  (or,  if  no  such  job  exists,  the  time  that  the  active  job  will  finish  if  it 
is  not  preempted).  In  this  case,  te  =  17.0.  The  value  te  —  tb  represents  an  upper  bound  on  the 
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amount  of  possible  execution  time  “lost”  to  the  optimal  offline  algorithm  due  to  the  completion  of 
the  active  job.  The  other  variable,  pJoss,  is  equal  to  the  length  of  the  first  active  job  of  the  current 
interval.  Because  in  general  this  job  could  have  laxity,  the  offline  algorithm  may  be  able  to  complete 
it  outside  of  the  range  If  the  algorithm  completes  the  active  job  and  this  job’s  length  is  at 

least  *  ~L  +pJoss  :  then  the  algorithm  is  guaranteed  to  be  4-competitive  for  this  interval  (note  that 
k  =  1  implies  that  all  jobs  have  the  same  value  density  and  thus  that  lengths  can  used  to  compute  the 
competitive  ratio).  Because  this  is  not  case  at  time  4.8  (since  1  ~l  +pJoss  =  17.0-0.9+4.0  >  4  q  _ 
the  algorithm  preempts  job  2  for  job  3,  which  then  executes  to  completion. 


Job 

Ti 

di 

h 

Vi 

1 

0.0 

0.9 

0.9 

0.9 

2 

0.5 

5.5 

4.0 

4.0 

3 

4.8 

17.0 

12.2 

12.2 

Table  1:  Input  used  to  recap  TD±  (version  2)  [3].  The  up  and  down  arrows  represent  r,;  and  di, 
respectively,  while  the  length  of  the  box  equals 


3  Mechanism  Design  Setting 

However,  false  information  about  job  2  would  cause  TDi  (version  2)  to  complete  this  job.  For 
example,  if  job  2’s  deadline  were  declared  as  d-2  =  4.7,  then  it  would  have  zero  laxity  at  time  0.7.  At 
this  time,  the  algorithm  would  preempt  job  1  for  job  2,  because  1  +pJos<;  =  4.<-(ho+i.o  >  q.9  _  ^ 
Job  2  would  then  complete  before  the  arrival  of  job  3. 2 

In  order  to  address  incentive  issues  such  as  this  one,  we  need  to  formalize  the  setting  as  a 
mechanism  design  problem.  In  this  section  we  first  present  the  mechanism  design  formulation,  and 
then  define  our  goals  for  the  mechanism. 

3.1  Formulation 

There  exists  a  center,  who  controls  the  processor,  and  a  set  N  =  {1, . . .  ,n}  of  agents,  where  the 
value  of  n  is  unknown  by  the  center  beforehand.  Each  job  i  is  owned  by  a  separate  agent  i.  The 
characteristics  of  the  job  define  the  agent’s  type  di  €  0;.  At  time  r,; ,  agent  i  privately  observes  its 
type  di,  and  has  no  information  about  job  1  before  r, .  Thus,  jobs  are  still  released  over  time,  but 
now  each  job  is  released  only  to  the  owning  agent. 

Agents  interact  with  the  center  through  a  direct  mechanism  T  =  (@1, . . . ,  0„,  <?(•)),  in  which  each 
agent  declares  a  job,  denoted  by  di  =  ( f%,di,li,Vi ),  and  the  function  j  :  0i  x  ...  x  0„  — >  O  maps 
the  declared  types  to  an  outcome  o  G  O.  An  outcome  o  =  (/(•),  pi , . . .  ,pn )  consists  of  the  online 
algorithm  /(•)  that  produces  the  schedule,  and  a  payment  from  each  agent  to  the  mechanism. 

While  it  would  be  easy  to  alter  the  algorithm  to  recognize  that  this  is  not  possible  for  the  jobs  in  Table  1,  our 
example  does  not  depend  on  the  use  of  pJoss. 

2 While  we  will  not  describe  the  significantly  more  complex  D0';er,  we  note  that  it  is  similar  in  its  use  of  intervals 
and  its  preference  for  the  active  job.  Also,  we  note  that  the  lower  bound  we  will  show  in  Section  5  implies  that  false 
information  can  also  benefit  a  job  in  J)ovel\ 
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In  a  standard  mechanism  design  setting,  the  outcome  is  enforced  at  the  end  of  the  mechanism. 
However,  since  the  end  is  not  well-defined  in  this  online  setting,  we  choose  to  model,  for  each  agent  z, 
the  return  of  a  completed  job  and  the  collection  of  a  payment  as  occurring  at  dj.  (which,  according  to 
agent  V  s  declaration,  is  latest  relevant  point  of  time  for  that  agent).  Thus,  even  if  job  z  is  completed 
before  di,  the  center  does  not  return  the  job  to  agent  z  until  that  time.  This  modelling  decision 
could  instead  be  viewed  as  a  decision  by  the  mechanism  designer  from  a  larger  space  of  possible 
mechanisms.  Indeed,  as  we  will  discuss  later,  this  decision  of  when  to  return  a  completed  job  is 
crucial  to  our  mechanism. 

The  utility  function  each  agent  aims  to  maximize,  Ui(g(9),9i)  =  Vi  ■  p(ei(9,di)  >  k)  ■  p(di  < 
di)  —pi(9),  is  a  linear  function  of  its  value  for  its  job  (if  completed  and  returned  by  its  true  deadline) 
and  the  payment  it  makes  to  the  center. 

Agent  declarations  are  restricted  in  that  an  agent  cannot  declare  a  length  shorter  than  the  true 
length,  since  the  center  would  be  able  to  detect  such  a  lie  if  the  job  were  completed.  On  the 
other  hand,  in  the  general  formulation  we  will  allow  agents  to  declare  longer  lengths,  since  in  some 
settings  it  may  be  possible  add  unnecessary  work  to  a  job.  However,  we  will  also  consider  a  restricted 
formulation  in  which  this  type  of  lie  is  not  possible.  The  declared  release  time  f  j  is  the  time  that 
the  agent  chooses  to  submit  job  z  to  the  center,  and  it  cannot  precede  the  time  Ty  at  which  the  job 
is  revealed  to  the  agent.  The  agent  can  declare  an  arbitrary  deadline  or  value.  To  summarize,  agent 
z  can  declare  any  type  9i  =  (fj,  dj,  Zj,  fj)  such  that  Zj  >  Zj  and  fj  >  r*. 

While  in  the  non-strategic  setting  it  was  sufficient  for  the  algorithm  to  know  the  upper  bound  k 
on  the  ratio  ppma:c ,  in  the  mechanism  design  setting  we  will  strengthen  this  assumption  so  that  the 
mechanism  also  knows  Pmin  (or,  equivalently,  the  range  [pmim  Pmax]  of  possible  value  densities).3 
While  we  feel  that  it  is  unlikely  that  a  center  would  know  k  without  knowing  this  range,  we  later 
present  a  mechanism  that  does  not  depend  on  this  extra  knowledge  in  a  restricted  setting. 

The  restriction  on  the  schedule  is  now  that  S(9,t)  =  i  implies  fj  <  t,  to  capture  the  fact  that 
a  job  cannot  be  scheduled  on  the  processor  before  it  is  declared  to  the  mechanism.  As  before, 
preemption  of  jobs  is  allowed,  and  job  switching  takes  no  time. 

The  constraints  due  to  the  online  mechanism’s  lack  of  knowledge  of  the  future  are  that  9(t)  =  9'(t) 
implies  S(9,t)  =  S(6',t ),  and  9(di)  =  9'(di)  implies  Pi(9)  =  Pi(9')  for  each  agent  z.  The  setting  can 
then  be  summarized  as  follows. 


Setting  1  Overview 

for  all  t  do 

The  center  instantiates  S(9,t)  <—  z,  for  some  i  s.t.  f  j  <  t 
if  3z,  (r*j  =  t)  then 

9i  is  revealed  to  agent  z 

if  3z,  ( t  >  Vi)  and  agent  i  has  not  declared  a  job  then 
Agent  i  can  declare  any  job  0j,  s.t.  fi  =  t  and  /j  >  Zj 
if  3z,  (di  =  t)  A  (e.i(9,t)  >  li)  then 
Completed  job  z  is  returned  to  agent  i 
if  3z,  (di  =  t)  then 

Center  sets  and  collects  payment  Pi(9)  from  agent  z 


3.2  Mechanism  Goals 

Our  aim  as  mechanism  designer  is  to  maximize  the  value  of  completed  jobs,  subject  to  the  con¬ 
straints  of  (dominant  strategy)  incentive  compatibility  and  individual  rationality,  for  which  we  use 

’Note  that  we  could  then  force  agent  declarations  to  satisfy  pmin  <  S2-  <  pmax-  However,  this  restriction  would 
not  decrease  the  lower  bound  on  the  competitive  ratio. 
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the  standard  definitions.4 

Definition  1  A  direct  mechanism  satisfies  incentive  compatibility  (IC)  if: 

Vi,  Oi,  9\,  6_i  :  Ui(g(9i,0-i),9i)  >  ui(g(9'i,9_i),9i ) 

Definition  2  A  direct  mechanism  satisfies  individual  rationality  (IR)  if 
Vi,9i,9-i,  Ui(g(9i,9-i),9i)  >  0 

The  social  welfare  function  that  we  aim  to  maximize  is  the  same  as  the  objective  function  of 
the  non-strategic  setting:  W(f(9),9)  =  JA  (vi  •  p(ei(9,di)  >  Z,)).  As  in  the  non-strategic  setting, 
we  will  evaluate  an  online  mechanism  using  competitive  analysis  to  compare  it  against  an  optimal 
offline  mechanism  (which  we  will  denote  by  T  offline)-  An  offline  mechanism  knows  all  of  the  types 
at  time  0,  and  thus  can  always  achieve  W*(9). 5 

Definition  3  An  online  mechanism  T  is  (strictly)  c-competitive  if  it  satisfies  IC,  and  if  there  does 
not  exist  a  profile  of  agent  types  9  such  that  c-  W(f(9),9)  <  W*(9). 

Note  that,  to  guarantee  that  the  competitive  ratio  has  been  achieved,  the  online  mechanism  must 
satisfy  IC,  in  order  to  ensure  that  the  declared  types  used  by  its  algorithm  are  indeed  the  true  types. 

4  Results 

In  this  section,  we  first  present  our  main  positive  result:  a  ((1  +  Vk)2  +  l) -competitive  mechanism 
(Iff).  After  providing  some  intuition  as  to  why  Ti  satisfies  individual  rationality  and  incentive 
compatibility,  we  formally  prove  first  these  two  properties  and  then  the  competitive  ratio.  We  then 
consider  a  special  case  in  which  k  =  1  and  agents  cannot  lie  about  the  length  of  their  job,  which 
allows  us  to  alter  this  mechanism  so  that  it  no  longer  requires  either  knowledge  of  pmin  or  the 
collection  of  payments  from  agents. 

4.1  General  Setting 

For  the  full  setting  described  above,  we  present  Ti,  which  is  formally  defined  below.  Unlike  TD\ 
(version  2)  and  Dover,  Ti  gives  no  preference  to  the  active  job.  Instead,  it  always  executes  the 
available  job  with  the  highest  priority:  (0,  +  \Jk  ■  ei(9 ,  t)  ■  pmin )•  Each  agent  whose  job  is  completed 
is  then  charged  the  lowest  value  that  it  could  have  declared  such  that  its  job  still  would  have  been 
completed,  holding  constant  the  rest  of  its  declaration. 

We  now  state  our  theoretical  results  for  this  mechanism  (with  proofs  found  in  the  appendex), 
and  provide  intuition  as  to  why  Tj  addresses  each  of  the  incentive  issues. 

Theorem  1  Mechanism  T  \  satisfies  IR. 

Theorem  2  Mechanism  T  i  satisfies  IC. 

4 A  possible  argument  against  the  need  for  incentive  compatibility  in  this  setting  is  that  an  agent’s  lie  may  actually 
improve  the  schedule.  In  fact,  this  was  the  case  in  the  example  we  showed  for  the  false  declaration  d 2  =  4.7.  However, 
if  an  agent  lies  due  to  incorrect  beliefs  over  the  future  input,  then  the  lie  could  instead  make  the  schedule  the  worse 
(for  example,  if  job  3  were  never  released,  then  job  1  would  have  been  unnecessarily  abandoned).  Furthermore,  if  we 
do  not  know  the  beliefs  of  the  agents,  and  thus  cannot  predict  how  they  will  lie,  then  we  can  no  longer  provide  a 
competitive  guarantee  for  our  mechanism. 

5 Another  possibility  is  to  allow  only  the  agents  to  know  their  types  at  time  0,  and  to  force  T0ffnne  to  be  incentive 
compatible  so  that  agents  will  truthfully  declare  their  types  at  time  0.  However,  this  would  not  affect  our  results, 
since  executing  a  Clarke  mechanism  [8]  at  time  0  satisfies  IC  and  IR,  and  always  maximizes  social  welfare. 
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Mechanism  1  Tj 

Execute  S{9,)  according  to  Algorithm  1 

for  all  i  do 

if  ei(9,di)  >  li  {Agent  i’s  job  is  completed }  then 
Pi(6)  <- -  argmin„/>o(ei(((fi,di,Zi,u-),6>_i),di)  >  Z*) 

else 

Pi0)  <-  0 


Algorithm  1 
for  all  t  do 

Avail  <—  {*|(t  >  fj)  A  (ej(0,  t)  <  /*)  A  ( ei{9 ,  t)  +  di  —  t>  U)} 
{Set  of  all  released,  non-completed,  non- abandoned  jobs} 
if  Avail  ^  0  then 

^  arg  maxj.^Avciii (u&  +  Vk  ■  ei(0,t)  ■  Pmin) 

{Break  ties  in  favor  of  lower  A} 

else 

S0,t)  <-  0 


By  the  use  of  a  payment  rule  similar  to  that  of  a  second-price  auction,  Id  satisfies  both  IC 
with  respect  to  values  and  IR.  We  now  argue  why  it  satisfies  IC  with  respect  to  the  other  three 
characteristics.  Declaring  an  “improved”  job  (i.e.,  declaring  an  earlier  release  time,  a  shorter  length, 
or  a  later  deadline)  could  possibly  decrease  the  payment  of  an  agent.  However,  the  first  two  lies  are 
not  possible  in  our  setting,  while  the  third  would  cause  the  job,  if  it  is  completed,  to  be  returned 
to  the  agent  after  the  true  deadline.  This  is  the  reason  why  it  is  important  to  always  return  a 
completed  job  at  its  declared  deadline,  instead  of  at  the  point  at  which  it  is  completed. 

It  remains  to  argue  why  an  agent  does  not  have  incentive  to  “worsen”  its  job.  First,  note  that 
if  the  job  is  completed  both  for  truthful  and  a  worse,  false  declaration,  then  the  payment  of  the 
agent  cannot  decrease,  since  the  completion  of  a  job  is  monotonic  in  each  of  f,;,  di,  and  U.  Second, 
the  only  possible  effects  of  an  inflated  length  on  the  completion  of  a  job  are  to  delay  it  or  cause  it 
to  be  abandoned,  and  the  only  possible  effects  of  an  earlier  declared  deadline  are  to  cause  it  to  be 
abandoned  or  to  cause  it  to  be  returned  earlier  (which  has  no  effect  on  the  agent’s  utility  in  our 
setting).  On  the  other  hand,  it  is  less  obvious  why  agents  do  not  have  incentive  to  declare  a  later 
release  time.  Consider  a  mechanism  T^  that  differs  from  T i  in  that  it  does  not  preempt  the  active 
job  %  unless  there  exists  another  job  j  such  that  (Dj  +  y/k  ■  k{9,  t)  ■  pmin)  <  Vj.  Note  that  as  an  active 
job  approaches  completion  in  Ti,  its  condition  for  preemption  approaches  that  of  T^. 

However,  the  types  in  Table  2  for  the  case  of  k  =  1  show  why  an  agent  may  have  incentive  to 
delay  the  arrival  of  its  job  under  Tj.  Job  1  becomes  active  at  time  0,  and  job  2  is  abandoned  upon 
its  release  at  time  6,  because  10  +  10  =  v\  +  l\  >  V2  =  13.  Then,  at  time  8,  job  1  is  preempted  by 
job  3,  because  10  +  10  =  Vi  +  li  <  v3  =  22.  Job  3  then  executes  to  completion,  forcing  job  1  to  be 
abandoned.  However,  job  2  had  more  “weight”  than  job  1,  and  would  have  prevented  job  3  from 
being  executed  if  it  had  been  the  active  job  at  time  8,  since  13  +  13  =  v-i  +  I2  >  V3  =  22.  Thus,  if 
agent  1  had  falsely  declared  ?i  =  20,  then  job  3  would  have  been  abandoned  at  time  8,  and  job  1 
would  have  completed  over  the  range  [20,  30]. 

Intuitively,  Ti  avoids  this  problem  because  of  two  properties.  First,  when  a  job  becomes  active, 
it  must  have  a  greater  priority  than  all  other  available  jobs.  Second,  because  a  job’s  priority  can 
only  increase  through  the  increase  of  the  term  (y/k  ■  ei(6,t)  ■  pmin)-,  the  rate  of  increase  of  a  job’s 
priority  is  independent  of  its  characteristics.  These  two  properties  together  imply  that,  while  a  job 
is  active,  there  cannot  exist  a  time  at  which  its  priority  is  less  than  the  priority  that  one  of  these 
other  jobs  would  have  achieved  by  executing  on  the  processor  instead. 
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Job 

n 

di 

k 

Vi 

1 

0 

30 

10 

10 

2 

6 

19 

13 

13 

3 

8 

30 

22 

22 

r 

o 


t 

6  T  lcT^  T  T  ^~20  T  T  T  30 


Table  2:  Jobs  used  to  show  why  a  slightly  altered  version  of  Id  would  not  be  incentive  compatible 
with  respect  to  release  times. 


Using  the  fact  that  IC  is  satisfied,  we  can  now  prove  that  F-|  is  ((1  +  Vk)2  +  l) -competitive  by 
proving  that  the  algorithm  used  by  achieves  this  competitive  ratio,  assuming  truthful  inputs. 

Theorem  3  Mechanism  Ti  is  ((1  +  Vk)2  +  l) -competitive. 

4.2  Special  Case:  Unalterable  length  and  k=l 

While  so  far  we  have  allowed  each  agent  to  lie  about  all  four  characteristics  of  its  job,  lying  about 
the  length  of  the  job  is  not  possible  in  some  settings.  For  example,  a  user  may  not  know  how  to 
alter  a  computational  problem  in  a  way  that  both  lengthens  the  job  and  allows  the  solution  of  the 
original  problem  to  be  extracted  from  the  solution  to  the  altered  problem.  Another  restriction  that 
is  natural  in  some  settings  is  uniform  value  densities  ( k  =  1),  which  was  the  case  considered  by 
[3].  If  the  setting  satisfies  these  two  conditions,  then,  by  using  Mechanism  U  (formally  described 
below) ,  we  can  achieve  a  competitive  ratio  of  5  (which  is  the  same  competitive  ratio  as  Ti  for  the 
case  of  k  =  1)  without  knowledge  of  pmin  and  without  the  use  of  payments.  The  latter  property 
may  be  necessary  in  settings  that  are  more  local  than  grid  computing  (e.g.,  within  a  company)  but 
in  which  the  users  are  still  self-interested.6 


Mechanism  2  T2 

Execute  S(9,  •)  according  to  Algorithm  2 

for  all  i  do 

Pi(6)  <—  0 


Algorithm  2 
for  all  t  do 

Avail  <—  {i|(<  >  r.i)  A  (e^(0,  t)  <  li)  A  (e.j (9,  t)  +  di  —  t  >  li)} 
if  Avail  0  then 

S(9,t)  <-  aigmax.i€Avau(k  +  ei{9,t )) 

{Break  ties  in  favor  of  lower  f;} 

else 

S{0,t)  <-  0 


6 While  payments  are  not  required  in  this  setting,  T2  can  be  changed  to  collect  a  payments  without  affecting 
incentive  compatibility  by  charging  some  fixed  fraction  of  li  for  each  job  i  that  is  completed. 
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Theorem  4  When  k  =  1,  and  each  agent  i  cannot  falsely  declare  li,  Mechanism,  r2  satisfies  IR  and 

IC. 

Theorem  5  When  k  =  1,  and  each  agent  i  cannot  falsely  declare  li,  Mechanism  r2  is  5- competitive. 

Since  this  mechanism  is  essentially  a  simplification  of  Ifi,  we  omit  proofs  of  these  theorems. 
Basically,  the  fact  that  k  —  1  and  k  =  li  both  hold  allows  r2  to  substitute  the  priority  (li  +  ei(0,t)) 
for  the  priority  used  in  Id;  and,  since  Vi  is  ignored,  payments  are  no  longer  needed  to  ensure  incentive 
compatibility. 


5  Competitive  Lower  Bound 

We  now  show  that  the  competitive  ratio  of  (1  +  Vk)2  +  1  achieved  by  is  a  lower  bound  for 
deterministic  online  mechanisms,  under  a  pair  of  conditions.  First,  we  appeal  to  third  requirement 
on  a  mechanism,  non-negative  payments  (NNP),  which  requires  that  the  center  never  pays  an  agent 
(formally,  \/i,0,  Pi(6i)  >  0).  While  we  did  not  require  that  our  mechanisms  satisfy  this  requirement, 
we  note  that  both  Ti  and  T2  satisfy  it  trivially,  and  that,  in  the  proof  of  this  theorem  (found  in 
the  appendix),  zero  only  serves  as  a  baseline  utility  for  an  agent,  and  could  be  replaced  by  any 
non-positive  function  of  9-i. 

Second,  we  restrict  consideration  to  settings  in  which  k  >  1.  For  the  case  of  k  —  1,  we  can  achieve 
a  competitive  ratio  of  4  by  using  TDi  (version  2)  [3],  and  charging  li  ■  pmin  to  each  agent  i  whose 
job  is  completed.  Any  agent  who  truthfully  declares  the  length  of  his  job  is  indifferent  between 
whether  his  job  is  completed  or  not,  because  k  =  1  implies  that  his  payment  upon  completion  will 
be  equal  to  his  value.  If  he  inflates  the  length  of  his  job,  then  he  will  have  negative  utility  for  its 
completion.  Thus,  an  agent  has  no  incentive  to  falsely  declare  his  type.  However,  we  chose  not 
to  use  this  mechanism  for  the  restricted  setting  in  the  previous  section,  because  agents  never  have 
incentive  to  even  participate  (or  not  to  declare  a  type  such  that  their  job  would  never  be  completed). 

Theorem  6  There  does  not  exist  a  deterministic  online  mechanism  that  satisfies  IC,  IR,  and  NNP, 
and  that  achieves  a  competitive  ratio  less  than  (1  +  Vk)2  +  1,  for  any  k  >  1. 


6  Related  Work 

In  this  section  we  describe  related  work  other  than  the  two  papers  ([3]  and  [14])  on  which  this  work  is 
based.  Recent  work  related  to  this  scheduling  domain  has  focused  on  competitive  analysis  in  which 
the  online  algorithm  uses  a  faster  processor  than  the  offline  algorithm  (see,  e.g.,  [12,  13]).  Mechanism 
design  was  also  applied  to  a  scheduling  problem  in  [16].  In  their  model,  the  center  owns  the  jobs  in  an 
offline  setting,  and  it  is  the  agents  who  can  execute  them.  The  private  information  of  an  agent  is  the 
time  it  will  require  to  execute  each  job.  Several  incentive  compatible  mechanisms  are  presented  that 
are  based  on  approximation  algorithms  for  the  computationally  infeasible  optimization  problem. 

Online  execution  presents  a  different  type  of  algorithmic  challenge,  and  several  other  papers 
study  online  algorithms  or  mechanisms  in  economic  settings.  For  example,  [4]  considers  an  online 
market  clearing  setting,  in  which  the  auctioneer  matches  buy  and  sells  bids  (which  are  assumed  to 
be  exogenous)  that  arrive  and  expire  over  time.  In  [1],  a  general  method  is  presented  for  converting 
an  online  algorithm  into  an  online  mechanism  that  is  incentive  compatible  with  respect  to  values. 
Truthful  declaration  of  values  is  also  considered  in  [2]  and  [15],  which  both  consider  multi-unit  online 
auctions.  The  main  difference  between  the  two  is  that  the  former  considers  the  case  of  a  digital  good, 
which  thus  has  unlimited  supply.  It  is  pointed  out  in  [15]  that  their  results  continue  to  hold  when 
the  setting  is  extended  so  that  bidders  can  delay  their  arrival. 

The  only  other  work  we  are  aware  of  that  addresses  the  issue  of  incentive  compatibility  in  a 
real-time  system  is  [10],  which  considers  several  variants  of  a  model  in  which  the  center  allocates 
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bandwidth  to  agents  who  declare  both  their  value  and  their  arrival  time.  A  dominant  strategy  IC 
mechanism  is  presented  for  the  variant  in  which  every  point  in  time  is  essentially  independent,  while 
a  Bayes-Nash  IC  mechanism  is  presented  for  the  variant  in  which  the  center’s  current  decision  affects 
the  cost  of  future  actions. 


7  Discussion 

In  this  paper,  we  considered  an  online  scheduling  domain  for  which  algorithms  with  the  best  possible 
competitive  ratio  had  been  found,  but  for  which  new  solutions  were  required  when  the  setting  is 
extended  to  include  self-interested  agents.  We  presented  a  mechanism  that  is  incentive  compatible 
with  respect  to  release  time,  deadline,  length  and  value,  and  that  only  increases  the  competitive 
ratio  by  one.  We  also  showed  how  this  mechanism  could  be  simplified  when  k  =  1  and  each  agent 
cannot  lie  about  the  length  of  its  job.  We  then  showed  a  matching  lower  bound,  under  a  pair  of 
conditions,  on  the  competitive  ratio  that  can  be  achieved  by  a  deterministic  mechanism. 

It  would  be  interesting  to  determine  whether  the  lower  bound  can  be  strengthened  by  removing 
the  restriction  of  non-negative  payments.  More  generally,  the  use  of  randomized  mechanisms  in  this 
setting  provides  an  unexplored  area  for  future  work. 
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A  Proofs 

A.l  Proof  of  Theorem  1 

Theorem  7  Mechanism  T i  satisfies  IR. 

Proof:  For  arbitrary  i,0i,9-i,  if  job  i  is  not  completed,  then  agent  i  pays  nothing  and  thus  has 
a  utility  of  zero;  that  is,  pi{9il9_i)  =  0  and  Ui(g(9i,9-i),9i)  =  0.  On  the  other  hand,  if  job  i 
is  completed,  then  its  value  must  exceed  agent  i’s  payment.  Formally,  Ui(g(9i,9-i),9i)  =  Vi  — 
argmin„'>o(e,(((r,,  di,  U,  v'f),  9-i),  df)  >  If)  >  0  must  hold,  since  v\  =  Vi  satisfies  the  condition.  q 


A. 2  Proof  of  Theorem  2 

To  prove  incentive  compatibility,  we  need  to  show  that  for  an  arbitrary  agent  i  with  type  9i,  and 
an  arbitrary  profile  9-i  of  declarations  of  the  other  agents,  agent  i  can  never  gain  by  making  a  false 
declaration  9 j  ^  9i ,  subject  to  the  constraints  that  fj  >  r,  and  f  >  lt.  We  break  this  proof  into 
lemmas. 

We  start  by  showing  that,  regardless  of  A,  if  truthful  declarations  of  rj,  di,  and  li  do  not  cause 
job  i  to  be  completed,  then  “worse”  declarations  of  these  variables  (that  is,  declarations  that  satisfy 
fi  >  r.i,  li  >  U  and  di  <  di)  can  never  cause  the  job  to  be  completed.  We  break  this  part  of  the 
proof  into  two  lemmas,  first  showing  that  it  holds  for  the  release  time,  regardless  of  the  declarations 
of  the  other  variables,  and  then  for  length  and  deadline. 

Lemma  8  In  mechanism  Ti,  the  following  condition  holds  for  all  i,9i,9-i: 

V  v>i,  li  >  li,  di  <  di,  fj  >  r.i,  [ej(((fj,  dj,  (j,  fj),  0_j),  dj)  >  if)  => 

[a  (((g j  di ,  li ,  Vi) ,  9— j) ,  df)  C  li ] 

Proof:  Assume  by  contradiction  that  this  condition  does  not  hold  that  is,  job  i  is  not  completed 
when  ri  is  truthfully  declared,  but  is  completed  for  some  false  declaration  fj  >  rj.  We  first  analyze 
the  case  in  which  the  release  time  is  truthfully  declared,  and  then  we  show  that  job  i  cannot  be 
completed  when  agent  i  delays  submitting  it  to  the  center. 

Case  I:  Agent  i  declares  9[  =  ( rj ,  di,  U,  vf). 

First,  define  the  following  three  points  in  the  execution  of  job  i. 

•  Let  ts  =  argmint  ( S((9'i,9—i),t )  =  i)  be  the  time  that  job  i  first  starts  execution. 
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•  Let  tp  =  argmint>t»  ( S{{6[,6-i),t )  ^  i)  be  the  time  that  job  i  is  first  preempted. 

•  Let  ta  =  argmint  (e.j (((?',  0_j),  f)  +  di  —  t  <  U )  be  the  time  that  job  i  is  abandoned. 

If  ts  and  tp  are  undefined  because  job  i  never  becomes  active,  then  let  ts  =  tp  =  ta. 

Also,  partition  the  jobs  declared  by  other  agents  before  ta  into  the  following  three  sets. 

•  Let  X  =  {j\(fj  <  tp)  A  (j  ^  i)}  consist  of  the  jobs  (other  than  i)  that  arrive  before  job  i  is 
first  preempted. 

•  Let  Y  =  {j\(tp  <  fj  <  ta)  A  ( Vj  >  Vi  +  Vk  ■  ej((0',  9-i),  fj)}  consist  of  the  jobs  that  arrive  in 
the  range  [tp,ta]  and  that  when  they  arrive  have  higher  priority  than  job  i  (note  that  we  are 
make  use  of  the  normalization  that  pmin  =  1). 

•  Let  Z  =  {j\(tp  <  rj  <  ta)  A  ( Vj  <  f  j  +  Vk  ■  e.;(((?',  9-i),  fj)}  consist  of  the  jobs  that  arrive  in 
the  range  [tp,ta]  and  that  when  they  arrive  have  lower  priority  than  job  i. 

We  now  show  that  all  active  jobs  during  the  range  (tp,ta]  must  be  either  i  or  in  the  set  Y. 
Unless  tp  =  ta  (in  which  case  this  property  trivially  holds),  it  must  be  the  case  that  job  i  has  a 
higher  priority  than  an  arbitrary  job  x  €  X  at  time  tP ,  since  at  the  time  just  preceding  tp  job  x  was 
available  and  job  i  was  active.  Formally,  vx  +  \Jk  •  ex((0(,  9_i),tp)  <  fj  +  \fk  •  ej((0',  9-i),  tp)  must 
hoick'  We  can  then  show  that,  over  the  range  [tp,ta],  no  job  x  £  X  runs  on  the  processor.  Assume 
by  contradiction  that  this  is  not  true.  Let  t/  £  [ tp ,  ta ]  be  the  earliest  time  in  this  range  that  some  job 
x  £  X  is  active,  which  implies  that  ex{{9'i,9_i),td)  =  ea;((0',  (?_;),  tp).  We  can  then  show  that  job  i 
has  a  higher  priority  at  time  t*  as  follows:  vx  +  \fk  •  ^((f?',  9-i),  t?)  =  vx  +  Vk  ■  ex((9[,  9-i),  tp)  < 
Vi  +  Vk-ei((9li,9-i),tp)  <  Vi  +  \/k-  e*((^,  9-i),t ^),  contradicting  the  fact  that  job  x  is  active  at  time 

tK 

A  similar  argument  applies  to  an  arbitrary  job  z  £  Z,  starting  at  it  release  time  rz  >  tp,  since 
by  definition  job  i  has  a  higher  priority  at  that  time.  The  only  remaining  jobs  that  can  be  active 
over  the  range  (tp,ta]  are  i  and  those  in  the  set  Y. 

Case  II:  Agent  i  declares  6^  =  (fj,  di,  k,  f)j),  where  fj  >  r j. 

We  now  show  that  job  i  cannot  be  completed  in  this  case,  given  that  it  was  not  completed  in  case 
I.  First,  we  can  restrict  the  range  of  rt  that  we  need  to  consider  as  follows.  Declaring  fj  £  (ri,ts] 
would  not  affect  the  schedule,  since  ts  would  still  be  the  first  time  that  job  i  executes.  Also,  declaring 
fj  >  ta  could  not  cause  the  job  to  be  completed,  since  di  —  ta  <  k  holds,  which  implies  that  job  i 
would  be  abandoned  at  its  release.  Thus,  we  can  restrict  consideration  to  fj  £  (ts,ta}. 

In  order  for  declaring  9i  to  cause  job  i  to  be  completed,  a  necessary  condition  is  that  the  execution 
of  some  job  yc  £  Y  must  change  during  the  range  (tp,ta],  since  the  only  jobs  other  than  i  that 
are  active  during  that  range  are  in  Y.  Let  tc  =  argmin(6(tP>ta][3yc  £  Y,  (S ((0',  9-i),  t)  =  yc )  A 
(S((0i,  9-i),t)  ^  yc)\  be  the  first  time  that  such  a  change  occurs.  We  will  now  show  that  for 
any  fj  £  (ts,ta],  there  cannot  exist  a  job  with  higher  priority  than  yc  at  time  tc,  contradicting 
(S((9r9_i),t)^yc). 

First  note  that  job  i  cannot  have  a  higher  priority,  since  there  would  have  to  exist  at  £  (tp,tc) 
such  that  3 y  £  Y,  (S((9',  9-i),  t)  =  y)  A  (S((§i,6-i),t)  =  i),  contradicting  the  definition  of  tc. 

Now  consider  an  arbitrary  y  £  Y  such  that  y  ^  yc.  In  case  I,  we  know  that  job  y  has  lower 
priority  than  yc  at  time  tc;  that  is,  vy  +  Vk  ■  e.y((0',  9-i),  tc)  <  vyo  +  \/k  ■  eyc((0',  0_j),  tc).  Thus, 
moving  to  case  II,  job  y  must  replace  some  other  job  before  tc.  Since  ry  >  tp,  the  condition  is  that 
there  must  exist  some  t  £  ( tp ,  tc)  such  that  3ru  G  Y  U  {*},  ( S((9 ',  9-i),t)  =  w)  A  (S((9i,  9-i),  t)  =  y). 

'  For  simplicity,  when  we  give  the  formal  condition  for  a  job  x  to  have  a  higher  priority  than  another  job  y,  we 
will  assume  that  job  x’s  priority  is  strictly  greater  than  job  y’s,  because,  in  the  case  of  a  tie  that  favors  x,  future  ties 
would  also  be  broken  in  favor  of  job  x. 
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Since  w  £  Y  would  contradict  the  definition  of  tc,  we  know  that  w  =  i.  That  is,  the  job  that  y 
replaces  must  be  i.  By  definition  of  the  set  Y,  we  know  that  vy  >  Vi  +  Vk  •  e*((0-,  9-f,fy).  Thus,  if 
fy  <  t,  then  job  i  could  not  have  executed  instead  of  y  in  case  I.  On  the  other  hand,  if  hy  >  t,  then 
job  y  obviously  could  not  execute  at  time  t,  contradicting  the  existence  of  such  a  time  t. 

Now  consider  an  arbitrary  job  x  £  X .  We  know  that  in  case  I  job  i  has  a  higher  priority  than 
job  x  at  time  ts,  or,  formally,  that  vx  +  Vk •  ex((9i,  9-f,  ts)  <  i>i  +  Vk •  ej((0',  9-f,  ts).  We  also  know 
that  Vi  +  v/fc-ei((0',  9-f,tc )  <  vyc  +  Vk-eyc{{9'i,  9_f,tc).  Since  delaying  i’s  arrival  will  not  affect  the 
execution  up  to  time  ts ,  and  since  job  x  cannot  execute  instead  of  a  job  y  £  Y  at  any  time  t  £  ( tp ,  tc ] 
by  definition  of  tc,  the  only  way  for  job  x’s  priority  to  increase  before  tc  as  we  move  from  case  I  to 
II  is  to  replace  job  i  over  the  range  (ts,tc].  Thus,  an  upper  bound  on  job  a:’s  priority  when  agent  i 


declares  9i  is:  vx  +  Vk-  [ex((0-,0_j),ts)+ej((^,0_j),tc)-ej((0',0_j),ts)]  <  vi+Vk-  [ei((9'i,9-f,ts)+ 
ei((0',0_i),tc)  -  ei((0',0_,;),ts)]  =  Vi  +  Vk-  ei((0-,0_i),tc)  <  vyc  +  Vk  •  e3/c((0',  9^),  tc). 

Thus,  even  at  this  upper  bound,  job  yc  would  execute  instead  of  job  x  at  time  tc.  A  similar 
argument  applies  to  an  arbitrary  job  z  £  Z,  starting  at  it  release  time  rz.  Since  the  sets  {z},  X ,  Y ,  Z 
partition  the  set  of  jobs  released  before  ta,  we  have  shown  that  no  job  could  execute  instead  of  job 
yc,  contradicting  the  existence  of  tc,  and  completing  the  proof.  q 


Lemma  9  In  mechanism,  Ti,  the  following  condition  holds  for  all  i,9i,9-ii 

i  li  h  lii  di  £  di,  ,  li ,  Uj)  ,  9— j)  ,  dfj  ^  f  j  ■  s 

If'i  (((^*^  ?  ^i)  ?  9— f) ,  dj)  ^  f  j 

Proof:  Assume  by  contradiction  there  exists  some  instantiation  of  the  above  variables  such  that 
job  i  is  not  completed  when  Z,;  and  di  are  truthfully  declared,  but  is  completed  for  some  pair  of  false 
declarations  U  >  li  and  di  <  di. 

Note  that  the  only  effect  that  di  and  Z;  have  on  the  execution  of  the  algorithm  is  on  whether  or 
not  i  €  Avail.  Specifically,  they  affect  the  two  conditions:  ( ei{9 ,  t)  <  If)  and  (efd,  t)  +  di  —  t  >  If. 
Because  job  i  is  completed  when  f  and  di  are  declared,  the  former  condition  (for  completion)  must 
become  false  before  the  latter.  Since  truthfully  declaring  li  <  li  and  di  >  di  will  only  make  the 
former  condition  become  false  earlier  and  the  latter  condition  become  false  later,  the  execution  of 
the  algorithm  will  not  be  affected  when  moving  to  truthful  declarations,  and  job  i  will  be  completed, 
a  contradiction.  q 

We  now  use  these  two  lemmas  to  show  that  the  payment  for  a  completed  job  can  only  increase 
by  falsely  declaring  “worse”  di ,  and  hi. 


Lemma  10  In  mechanism  Ti,  the  following  condition  holds  for  all  i ,  9i,  9-i: 

V  h  >  h,  di  <  di ,  n  >  ri,  argmin.^o  [ei(((hi,di,li,v'f,9-f,df  >  h]  > 

argmin^o  [ei(((ri,di,li,v'f,9-f,di)  >  lf\ 

Proof:  Assume  by  contradiction  that  this  condition  does  not  hold.  This  implies  that  there  exists 
some  value  v[  such  that  the  condition  ( effri^di ,  v'f,9-f,  df  >  If  holds,  but  (e,(((rj,  di,  f,  v'f,9-f,df  > 

If  does  not.  Applying  Lemmas  8  and  9:  (efffi,  di,  U,  vf,9-f,df  >  If  =>  (effn,  di,  h,  vf,9-f,df  > 

if  =► 

(ej(((r,;,  di,  li,  v'f,9-f,  df  >  If,  a  contradiction.  □ 

Finally,  the  following  lemma  tells  us  that  the  completion  of  a  job  is  monotonic  in  its  declared 
value. 


Lemma  11  In  mechanism  Ti,  the  following  condition  holds  for  all  i,  9i,  9-i: 

V  f  f,  di  ^  di,  hi  ^  ri,  Vi  ^  Vi,  ^ei{f(riidi,li,vf,0—f,df  ^  ZjJ 

di,  li,  V.f ,  6 —f ,  df  ^  Zjj 
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The  proof,  by  contradiction,  of  this  lemma  is  omitted  because  it  is  essentially  identical  to  that  of 
Lemma  8  for  f*.  In  case  I,  agent  i  declares  (ft,  di,  k,  v[)  and  the  job  is  not  completed,  while  in  case 
II  he  declares  (r,;,  d,;,  li,  h,;)  and  the  job  is  completed.  The  analysis  of  the  two  cases  then  proceeds  as 
before-  the  execution  will  not  change  up  to  time  ts  because  the  initial  priority  of  job  i  decreases  as 
we  move  from  case  I  to  II;  and,  as  a  result,  there  cannot  be  a  change  in  the  execution  of  a  job  other 
than  i  over  the  range  (tp,ta). 

We  can  now  combine  the  lemmas  to  show  that  no  profitable  deviation  is  possible,  proving  The¬ 
orem  2. 

Theorem  12  Mechanism  T  i  satisfies  IC. 

Proof:  For  an  arbitrary  agent  i,  we  know  that  fj  >  Vi  and  li  >  It  hold  by  assumption.  We  also 
know  that  agent  i  has  no  incentive  to  declare  di  >  di ,  because  job  i  would  never  be  returned  before 
its  true  deadline.  Then,  because  the  payment  function  is  non-negative,  agent  *’s  utility  could  not 
exceed  zero.  By  IR,  this  is  the  minimum  utility  it  would  achieve  if  it  truthfully  declared  di .  Thus, 
we  can  restrict  consideration  to  di  that  satisfy  r,  >  ri:  lt  >  li,  and  di  <  di.  Again  using  IR,  we 
can  further  restrict  consideration  to  di  that  cause  job  i  to  be  completed,  since  any  other  (9,  yields  a 
utility  of  zero. 

If  truthful  declaration  of  di  causes  job  i  to  be  completed,  then  by  Lemma  10  any  such  false  decla¬ 
ration  di  could  not  decrease  the  payment  of  agent  i.  On  the  other  hand,  if  truthful  declaration  does 
not  cause  job  i  to  be  completed,  then  declaring  such  a  di  will  cause  agent  i  to  have  negative  utility, 
since,  by  Lemmas  11  and  10,  it  must  be  the  case  that:  Vi  <  argmin„'>Q  [e*(((ri,  di,  li,  v'f),  d-f),  di)  > 
li]  <  argmint,/>0  [^(((r*,  di,  U,  v’f),  6Lj),  di)  >k].  □ 

A. 3  Proof  of  Theorem  3 

The  proof  of  the  competitive  ratio,  which  makes  use  of  techniques  adapted  from  those  used  in  [14], 
is  also  broken  into  lemmas.  Having  shown  IC,  we  can  assume  truthful  declaration  {d  =  d),  and  it 
remains  to  bound  the  loss  of  social  welfare  against  T0ffiine. 

Denote  by  (1,2,  ...,F)  the  sequence  of  jobs  completed  by  Ti.  Divide  time  into  intervals  If  = 
(fiopen !,  tCf°ae],  one  for  each  job  /  in  this  sequence.  Set  tCf°se  to  be  the  time  at  which  job  /  is  completed, 

and  set  topen  =  for  /  >  2,  and  t°pen  =  0  for  /  =  1.  Also,  let  tb^3m  be  the  first  time  that  the 
processor  is  not  idle  in  interval  If. 

Lemma  13  For  any  interval  If,  the  following  inequality  holds:  f^ose  —  tbf9m  <  (1  +  •  Vf 

Proof:  Interval  If  begins  with  a  (possibly  zero  length)  period  of  time  in  which  the  processor  is 
idle  because  there  is  no  available  job.  Then,  it  continuously  executes  a  sequence  of  jobs  (1,2,...,  c), 
where  each  job  i  in  this  sequence  is  preempted  by  job  i  +  1,  except  for  job  c,  which  is  completed 
(thus,  job  c  in  this  sequence  is  the  same  as  job  /  is  the  global  sequence  of  completed  jobs).  Let  ff 
be  the  time  that  job  i  begins  execution.  Note  that  tf  =  tbe9m. 

Over  the  range  [tbe9m,  tCf°se],  the  priority  ( Vi  +  Vk  ■  ei(d,t))  of  the  active  job  is  monotonically 
increasing  with  time,  because  this  function  linearly  increases  while  a  job  is  active,  and  can  only 
increase  at  a  point  in  time  when  preemption  occurs.  Thus,  each  job  i  >  1  in  this  sequence  begins 
execution  at  its  release  time  (that  is,  tf  =  ?y),  because  its  priority  does  not  increase  while  it  is  not 
active. 

We  now  show  that  the  value  of  the  completed  job  c  exceeds  the  product  of  Vk  and  the  time 
spent  in  the  interval  on  jobs  1  through  c  —  1,  or,  more  formally,  that  the  following  condition  holds: 
vc  >  VkJ2h^i{eh(d,tsh+1)  —  eh(d,tD).  To  show  this,  we  will  prove  by  induction  that  the  stronger 
condition  Vi  >  \/~k  Y^h= i  eh(d,tsh+1)  holds  for  all  jobs  i  in  the  sequence. 
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Base  Case:  For  i  =  1,  v\  >  VkY^h=\  eh{d,tsh+ 1)  =  0>  since  the  sum  is  over  zero  elements. 
Inductive  Step:  For  an  arbitrary  1  <  i  <  c,  we  assume  that  Vi  >  Vk'YVh-  =\  eh(9,tsh+1)  holds.  At 
time  tf+1,  we  know  that  Vi+i  >  Vi  +  Vk-ei(9,t]+1)  holds,  because  tsi+1  =  ri+ ±.  These  two  inequalities 
together  imply  that  >  VkYVh-i  eh(9  ,tsh+1) ,  completing  the  inductive  step. 

We  also  know  that  tcJ,ose  —  tsc  <  lc  <  vc  must  hold,  by  the  simplifying  normalization  of  pmin  =  1 
and  the  fact  that  job  c’s  execution  time  cannot  exceed  its  length.  We  can  thus  bound  the  total 
execution  time  of  If  by:  tc}ose  -  tbfegm  =  ( tfose  -  tsc )  +  YX=iieh{0,  tsh+1)  ~  ehV,tsh))  <  (1  +  ~^)vf. 
□ 

We  now  consider  the  possible  execution  of  uncompleted  jobs  by  T 0ffune.  Associate  each  job  i 
that  is  not  completed  by  Tx  with  the  interval  during  which  it  was  abandoned.  All  jobs  are  now 
associated  with  an  interval,  since  there  are  no  gaps  between  the  intervals,  and  since  no  job  i  can  be 
abandoned  after  the  close  of  the  last  interval  at  tcpose.  Because  the  processor  is  idle  after  tpose,  any 
such  job  i  would  become  active  at  some  time  t  >  tpose ,  which  would  lead  to  the  completion  of  some 
job,  creating  a  new  interval  and  contradicting  the  fact  that  If  is  the  last  one. 

The  following  lemma  is  equivalent  to  Lemma  5.6  of  [14],  but  the  proof  is  different  for  our  mech¬ 
anism. 

Lemma  14  For  any  interval  If  and  any  job  i  abandoned  in  If ,  the  following  inequality  holds: 

Vi  <  (1  +  Vk)vf. 

Proof:  Assume  by  contradiction  that  there  exists  ajob  i  abandoned  in  If  such  that  Vi  >  (1  +  Vk)vf. 
At  tCf°se,  the  priority  of  job  /  is  Vf  +  Vk  -  If  <  (1  +  Vk)vf-  Because  the  priority  of  the  active  job 
monotonically  increases  over  the  range  [tb^9m  ,t‘jose],  job  i  would  have  a  higher  priority  than  the 
active  job  (and  thus  begin  execution)  at  some  time  t  €  [tb^9tn,  tcjose\.  Again  applying  monotonicity, 
this  would  imply  that  the  priority  of  the  active  job  at  tCf°se  exceeds  (1  +  Vk)vf ,  contradicting  the 
fact  that  it  is  (1  +  Vk)vf.  □ 

As  in  [14],  for  each  interval  If,  we  give  T0ffiine  the  following  “gift”:  k  times  the  amount  of  time 
in  the  range  \tb^9m ,  tCf°se]  that  it  does  not  schedule  ajob.  Additionally,  we  “give”  the  adversary  Vf, 
since  the  adversary  may  be  able  to  complete  this  job  at  some  future  time,  due  to  the  fact  that  T x 
ignores  deadlines.  The  following  lemma  is  Lemma  5.10  in  [14],  and  its  proof  now  applies  directly. 

Lemma  15  [If]  With  the  above  gifts  the  total  net  gain  obtained  by  the  clairvoyant  algorithm  from 
scheduling  the  jobs  abandoned  during  If  is  not  greater  than  (1  +  Vk)  ■  Vf. 

The  intuition  behind  this  lemma  is  that  the  best  that  the  adversary  can  do  is  to  take  almost  all 
of  the  “gift”  of  k  ■  (tcjfose  —  tb^e9m)  (intuitively,  this  is  equivalent  to  executing  jobs  with  the  maximum 
possible  value  density  over  the  time  that  Tx  is  active),  and  then  begin  execution  of  ajob  abandoned 
by  Tx  right  before  tcjoae .  By  Lemma  14,  the  value  of  this  job  is  bounded  by  (1  +  Vk)  ■  Vf.  We  can 
now  combine  the  results  of  these  lemmas  to  prove  Theorem  3. 

Theorem  16  MechanismT x  is  ((1  +  Vk)2  +  l)- competitive. 

Proof:  Using  the  fact  that  the  way  in  which  jobs  are  associated  with  the  intervals  partitions 
the  entire  set  of  jobs,  we  can  show  the  competitive  ratio  by  showing  that  Tx  is  ((1  +  Vk)2  +  l)- 
competitive  for  each  interval  in  the  sequence  (1, . . . ,  F).  Over  an  arbitrary  interval  If,  the  offline 
algorithm  can  achieve  at  most  (t‘[lose  —  tb^9m)  ■  k  +  Vf  +  (1  +  Vk)vf,  from  the  two  gifts  and  the  net 
gain  bounded  by  Lemma  15.  Applying  Lemma  13,  this  quantity  is  then  bounded  from  above  by 
l1+  j7k)'vfk  +  vf  +  (1  +  )vf  =  ((1  +  \/fc)2  +  1)  •  Vf.  Since  Tx  achieves  Vf,  the  competitive  ratio 
holds.  n 
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A. 4  Proof  of  Theorem  6 


The  proof  the  lower  bound  uses  an  adversary  argument  similar  to  that  used  in  [3]  to  show  a  lower 
bound  of  (1  +  Vk)2  in  the  non-strategic  setting,  with  the  main  novelty  lying  in  the  two  perturbations 
of  the  job  sequence  and  the  related  incentive  compatibility  arguments.  We  first  prove  a  lemma 
relating  to  the  recurrence  used  for  this  argument. 


Lemma  1  For  any  k  >  1,  for  the  recurrence  defined  by: 


lj+i 

h 


3 

x  ■  ij  -  k  •  y \ih 


h—1 


1 


(1) 


where  (1  +  Vk)2  —  1  <  A  <  (1  +  Vk)2 


mere  exists 


lm  +  k  ■  £™=1  lh  , 

L  >A 


(2) 


Proof:  This  proof  is  a  generalization  of  the  one  shown  in  [3]  for  the  case  of  k  =  1.  We  first  show  that 
the  existence  of  such  a  number  m  is  equivalent  to  the  existence  of  an  m  >  1  such  that  lm  <  lm- 1  ■ 
Rearranging  Equation  2,  and  using  Equation  1  to  substitute  in  for  lrn  yields: 


m—  1 


lm  +  k 


E 

h= 1 


lh 


m— 1  m—  1 

^  '  lm—  1  k  *  ^  ^  lh  k  •  ^  ^  lh 
h=l  h= 1 

lm—  1 


We  now  show  that  there  exists  an  m  >  1  that  satisfies  lm 
Equation  1  yields: 


>  A  •  l  in 

'' >  A  •  lm 
lm 

<  lm- 1  •  Substituting  j  in  for  j  +  1  in 


3~ 1 

lj  =  A  •  lj- 1  -  k-^lh  (3) 

h=  1 

Subtracting  Equation  3  from  Equation  1  produces: 


lj+ 1  lj 

lj+ 1 


X  *  lj  X  ■  lj — 'i  k  *  lj 
(A  T  1  —  /c)  *  lj  —  A  •  lj— i 


Thus,  we  can  re-write  the  recurrence  as: 


lj+2  —  (A  +  1  —  k)  ■  lj+i  —  A  •  lj 
h  =  1 
h  =  A  -  k 

We  now  use  the  standard  approach  for  solving  linear  homogeneous  recurrence  relations  (see,  e.g., 
[17]).  The  characteristic  equation  of  the  recurrence  is: 
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x2  —  (A  +  1  —  k)x  +  A  =  0 


The  roots  of  this  equation  are: 


(A  +  1  —  fc)  +  i/(A  +  1  —  fc)2  —  4A 
2:1  “  2 

(A  +  1  -  jfc)  -  J(X  +  1  -  k)2  -  4A 
*2  =  - 2 - 

We  now  show  that  the  roots  are  irrational  by  verifying  the  following  inequality. 


(A  +  l-fc)2 

< 

4A 

A2  +  2(1  —  At)  A  +  (1  —  fc)2 

< 

4A 

k 2  -2k+l 

< 

A  •  {2k  +  2  -  A) 

Using  the  condition  that  A  =  (1  +  Vk)2  —  e  for  some  e  €  (0,1),  it  suffices  to  verify  that  the 
following  inequality  holds  for  any  such  e. 


2k +  1 

< 

[(1  +  Vkf-e}- 

[2k  +  2  —  ((1  +  Vk)2  - 

-)] 

2k +  1 

< 

(/c  T  2\fk  +  1  —  e 

]-[2k  +  2-k 

-  2  Vk- 

1  + 

2k +  1 

< 

[(k  +  l)  +  (2Vk- 

-e)]-[(fc  +  l) 

-  (2 Vk- 

2k +  1 

< 

k2  +  2k  +  1  -  4k 

+  4  eVk  —  e2 

0 

< 

4  Vk  —  e 

Because  k  >  1,  this  inequality  holds  for  any  e  £  (0, 1).  The  two  roots  can  then  be  represented  as 
follows: 


Xi  =  y  +  iz 
x2  =  y  -  iz 

where 


A  +  1  —  k 
2 

>/4A  —  (A  +  1  —  k )2 
2 

Because  the  roots  are  distinct,  the  solution  to  the  recurrence  is  of  the  form:  lj+\  =  S\x\  +  52x2, 
where  5\,  S2  are  constants  that  we  now  derive.  The  initial  conditions  give  us  the  following  equations: 

1  =  <5i  +  S2 

A  —  k  =  Si  ■  (y  +  iz)  +  S2  ■  (y  —  iz) 

Solving  these  equations  yields: 


V  = 

z  = 
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Si 
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1  A  —  k  —  y 

2  +  2iz 

1  A  —  k  —  y 

2  2iz 


Because  the  pairs  {x\,xf)  and  (c)-| , ^2)  are  both  complex  conjugates  with  non-zero  imaginary 
parts,  we  can  represent  the  recurrence  as  follows  for  some  a,/3,9,u  ^  0: 


lj+i  =  aeie  ■  {/3eiu,)j  +  ae~ie  ■  {/3e~iuy 
lj+ 1  =  a-  P[ei{0+ju)  +e~i{e+ju)] 

lj+ 1  =  a  ■  ft  [cos(0  +  ju)  +  i  sin(0  +  ju)  + 

cos  (—{0  +  ju))  +  isin(—(9  +  jw))] 
lj+ 1  =  2  •  a  ■  ft  cos (9  +  ju) 

Because  u  ^  0,  it  must  be  the  case  that  cos (9  +  ju)  <  0  for  some  value  of  j  >  0.  Thus,  since 
a,  (3  >  0,  there  must  exist  some  j  >  0  such  that  lj+ \  <  0.  Combined  with  the  fact  that  >  0,  this 
implies  that  there  must  exist  an  m  >  1  such  that  lm  <  lm- 1,  completing  the  proof.  q 

We  now  present  the  proof  of  Theorem  6. 

Theorem  17  There  does  not  exist  a  deterministic  online  mechanism  that  satisfies  IC,  IR,  and  NNP, 
and  that  achieves  a  competitive  ratio  less  than  (1  +  Vk)2  +  1,  for  any  k  >  1. 

Proof:  Assume  by  contradiction  that  there  exists  a  deterministic  online  mechanism  T  that  satisfies 
IC,  IR,  and  NNP,  and  that  achieves  a  competitive  ratio  of  c  =  (1  +  %/fc)2  +  1  —  e  for  some  e  >  0. 
Since  a  competitive  ratio  of  c  implies  a  competitive  ratio  of  c  +  x,  for  any  x  >  0,  we  assume  without 
loss  of  generality  that  e  <  1.  First,  we  will  construct  a  profile  of  agent  types  9  using  an  adversary 
argument.  After  possibly  slightly  perturbing  9  to  assure  that  a  strictness  property  is  satisfied,  we 
will  then  use  a  more  significant  perturbation  of  9  to  reach  a  contradiction. 

We  now  construct  the  original  profile  9.  Pick  an  a  such  that  0  <  a  <  e,  and  define  <5  =  ckf)3k  ■ 
The  adversary  uses  two  sequences  of  jobs:  minor  and  major.  Minor  jobs  i  are  characterized  by 
h  =  5,  V{  =  k  •  S,  and  zero  laxity.  The  first  minor  job  is  released  at  time  0,  and  r,  =  for  all 
i  >  1.  The  sequence  stops  whenever  T  completes  any  job. 

Major  jobs  also  have  zero  laxity,  but  they  have  the  smallest  possible  value  ratio  (that  is,  Wj  =  If). 
The  lengths  of  the  major  jobs  that  may  be  released,  starting  with  i  =  1,  are  determined  by  the 
following  recurrence  relation. 

i 

U+ 1  =  (c-  1  +  a)  ■  h  -  k  ■  ^  lh 

h— 1 

h  =  1 

The  bounds  on  a  imply  that  (1  +  Vk)2  —  1  <  c—  1  +  a  <  (1  +  Vk)2,  which  allows  us  to  apply  Lemma 

1.  Let  to  be  the  smallest  positive  number  such  that  lm+k  pfe=1  lh  >  c  —  1  +  a. 

The  first  major  job  has  a  release  time  of  0,  and  each  major  job  i  >  1  has  a  release  time  of 
ri  =  di-i  —  6 ,  just  before  the  deadline  of  the  previous  job.  The  adversary  releases  major  job  i  <  m 
if  and  only  if  each  major  job  j  <  i  was  executed  continuously  over  the  range  [r*,  rj+i].  No  major  job 
is  released  after  job  to. 

In  order  to  achieve  the  desired  competitive  ratio,  T  must  complete  some  major  job  /,  because 
r offline  can  always  at  least  complete  major  job  1  (for  a  value  of  1),  and  T  can  complete  at  most 
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one  minor  job  (for  a  value  of  ^3  <  1).  Also,  in  order  for  this  job  /  to  be  released,  the  processor 
time  preceding  r/  can  only  be  spent  executing  major  jobs  that  are  later  abandoned.  If  /  <  in,  then 
major  job  /  +  1  will  be  released  and  it  will  be  the  final  major  job.  T  cannot  complete  job  /  +  1, 
because  rj  +  If  =  df  >  r/+i-  Therefore,  9  consists  of  major  jobs  1  through  /  +  1  (or,  /,  if  /  =  m), 
plus  minor  jobs  from  time  0  through  time  df. 

We  now  possibly  perturb  9  slightly.  By  IR,  we  know  that  v /  >  Pf{9).  Since  we  will  later  need 
this  inequality  to  be  strict,  if  Vf  =  Pf(0),  then  change  9f  to  9'f,  which  differs  from  9f  only  in  that 
v'f  =  Vf  +  5.  By  IC,  job  /  must  still  be  completed  by  T  for  the  profile  (■ 0f,0_f ).  If  not,  then 
by  IR  and  NNP  we  know  that  pf(6'f,9-f)  =  0,  and  thus  that  Uf(g(0f,9-f),6f)  =  0.  However, 
agent  /  could  then  increase  its  utility  by  falsely  declaring  the  original  type  of  9f,  receiving  a  utility 
of:  Uf(g(9f,9-f),9f)  =  v'f  —  p/(9)  =  5  >  0,  violating  IC.  Furthermore,  agent  /  must  be  charged 
the  same  amount  (that  is,  pf(0f,9-f)  =  pf(9)),  due  to  a  similar  incentive  compatibility  argument. 
Thus,  for  the  remainder  of  the  proof,  assume  that  Vf  >  pf{9). 

We  now  use  a  more  substantial  perturbation  of  9  to  complete  the  proof.  If  /  <  in,  then  define 
9'f  to  be  identical  to  9f,  except  that  df  =  df+ 1  +  If,  allowing  job  /  to  be  completely  executed  after 
job  /  +  1  is  completed.  If  f  =  to,  then  instead  set  df  =  df  +  If. 

We  now  show  that,  for  the  profile  (0f,9-f),  T  must  still  execute  job  /  continuously  over  the 
range  [r/,77  +  If],  thus  preventing  job  /  +  1  from  being  completed.  Assume  by  contradiction  that 
this  were  not  true.  Then,  at  the  original  deadline  of  df,  job  /  is  not  completed.  Consider  the 
possible  profile  (6f,9-f,9x),  which  differs  from  the  new  profile  only  in  the  addition  of  a  job  x  which 
has  zero  laxity,  rx  =  df,  and  vx  =  lx  =  max(d'f  —  df,  (c+  1)  •  (If  +  lf+ 1)).  Because  this  new  profile 
is  indistinguishable  from  (Of,  9-f)  to  T  before  time  df,  it  must  schedule  jobs  in  the  same  way  until 
df.  Then,  in  order  to  achieve  the  desired  competitive  ratio,  it  must  execute  job  x  continuously  until 
its  deadline,  which  is  by  construction  at  least  as  late  as  the  new  deadline  df  of  job  /.  Thus,  job 
f  will  not  be  completed,  and,  by  IR  and  NNP,  it  must  be  the  case  that  pf(9f,9-f,9x)  =  0  and 
uf(g(9f,9-f,9x),9'f)  =0.  Using  the  fact  that  9  is  indistinguishable  from  (0f,9_f,0x)  up  to  time  df, 
if  agent  /  falsely  declared  his  type  to  be  the  original  9f,  then  its  job  would  be  completed  by  df  and 
it  would  be  charged  Pf(9).  Its  utility  would  then  increase  to  Uf(g(9f,  9_f,  9x),9f)  =  Vf  —pf(9)  >  0, 
contradicting  IC. 

While  r’s  execution  must  be  identical  for  both  (9f,0_f)  and  (Of , 9-f),  T0ffUne  can  take  advan¬ 
tage  of  the  change.  If  /  <  to,  then  T  achieves  a  value  of  at  most  If  +  S  (the  value  of  job  /  if  it  were 
perturbed),  while  T0ffune  achieves  a  value  of  at  least  k  ■  (J^h=i  ^  —  25)  +  lf+\  +  If  by  executing 
minor  jobs  until  r/+ 1,  followed  by  job  /  +  1  and  then  job  /  (we  subtract  two  S’s  instead  of  one 
because  the  last  minor  job  before  ?’/+i  may  have  to  be  abandoned).  Substituting  in  for  lf+i,  the 
competitive  ratio  is  then  at  least: 

fc  •  (SLi  h  ~~  2|5)  +  h+i  +  h 

If +  6 

k  '  (SLi  M  —  2fc  ■  S  +  (c  —  1  +  a)  ■  If  —  fc  •  (J2h= 1  lh)  +  If 

If +  6 

c  •  If  +  (a  ■  If  —  2k  ■  5) 

If  +  S 

^  c  •  If  +  ((ck  +  3 k)5  —  2 k  ■  S) 

-  ^7+5 

>  c 

If  instead  f  =  m,  then  T  achieves  a  value  of  at  most  lm  +  5,  while  T0f  fune  achieves  a  value  of  at 
least  k  ■  (X)hLi  ^  25)  +  lm  by  completing  minor  jobs  until  drn  =  rm  +  lm ,  and  then  completing  job 
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to.  The  competitive  ratio  is  then  at  least: 


> 


> 


k  •  —  2i5)  +  lm 

lm  +  d 

k  '  (TT-i.  W  —  2fc  •  5  +  klm  +  lm 

lm  + 

(c  1  T  cn)  *  2k  '  S  T  klm 

lm  +  <5 

(c  +  k  —  1)  •  lm  +  (alm  —  2k  ■  S) 

lm  +  5 

c 


□ 
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Abstract 

We  generalize  the  framework  of  non-cooperative  computation  (NCC),  recently  introduced  by 
Shoham  and  Tennenholtz,  to  apply  to  cryptographic  situations.  We  consider  functions  whose 
inputs  are  held  by  separate,  self-interested  agents.  We  consider  four  components  of  each  agent’s 
utility  function:  (a)  the  wish  to  know  the  correct  value  of  the  function,  (b)  the  wish  to  prevent 
others  from  knowing  it,  (c)  the  wish  to  prevent  others  from  knowing  one’s  own  private  input, 
and  (d)  the  wish  to  know  other  agents’  private  inputs.  We  provide  an  exhaustive  game  theoretic 
analysis  of  all  24  possible  lexicographic  orderings  among  these  four  considerations,  for  the  case 
of  Boolean  functions  (mercifully,  these  24  cases  collapse  to  four).  In  each  case  we  identify  the 
class  of  functions  for  which  there  exists  an  incentive-compatible  mechanism  for  computing  the 
function.  In  this  article  we  only  consider  the  situation  in  which  the  inputs  of  different  agents 
are  probabilistically  independent. 


1  Introduction 

In  this  paper  we  analyze  when  it  is  possible  for  a  group  of  agents  to  compute  a  function  of  their  pri¬ 
vately  known  inputs  when  their  own  self-interests  stand  in  the  way.  One  motivation  for  studying  this 
class  of  problems  is  cryptography.  Consider,  for  example,  the  problem  of  secure  function  evaluation 
(SFE).  In  SFE,  n  agents  each  wish  to  compute  a  function  of  n  inputs  (where  each  agent  z  possesses 
input  z),  without  revealing  their  private  inputs.  An  increasingly  clever  series  of  solutions  to  SFE 
have  been  proposed  (e.g.,  [1,  2]).  But  if  these  protocols  are  the  answer,  what  exactly  is  the  question? 
Like  many  other  cryptographic  problems,  SFE  has  not  been  given  a  mathematical  definition  that 
includes  the  preferences  of  the  agents.  We  hasten  to  add  that  this  does  not  mean  that  the  solutions 
are  not  clever  or  useful;  they  are.  However,  to  prove  that  agents  will  actually  follow  a  protocol,  one 
needs  a  game-theoretic  definition  of  the  SFE  problem.  It  turns  out  that  the  game  theoretic  analysis 
provides  a  slightly  different  perspective  on  (e.g.,)  SFE;  the  paranoias  of  game  theorists  are  more 
extreme  than  the  traditional  paranoias  of  cryptographers  in  some  respects  and  less  so  in  others.  The 
difference  between  the  two  demands  a  more  complete  discussion  than  we  have  space  for,  and  we 
discuss  the  issue  in  more  depth  in  a  companion  paper. 

In  this  paper  we  do  not  speak  about  cryptography  per  se,  but  rather  about  a  general  framework 
within  which  to  think  about  cryptography  and  related  phenomena.  The  framework  is  called  non- 
cooperative  computing ,  or  NCC  for  short.  The  term  was  introduced  by  Shoham  and  Tennenholtz  in 
[4],  who  adopt  a  narrower  setting.  The  NCC  framework  of  S&T  is  however  too  limited  to  account 
for  (e.g.,)  cryptography,  and  the  goal  of  this  paper  is  to  extend  it  so  it  does. 

We  give  the  formal  definitions  in  the  next  section,  but  let  us  describe  the  NCC  framework 
intuitively.  The  setting  includes  n  agents  and  an  n-ary  function  /,  such  that  agent  i  holds  input 
z.  Broadly  speaking,  all  the  agents  want  to  compute  /  correctly,  but  in  fact  each  agent  has  several 

‘This  work  was  generously  supported  by  DARPA  grant  F30602-00-2-0598  and  by  an  NSF  fellowship. 
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independent  considerations.  In  this  article  we  take  agent  i’s  utility  function  to  depend  on  the 
following  factors: 

Correctness :  i  wishes  to  compute  the  function  correctly. 

Exclusivity :  i  wishes  that  the  other  agents  do  not  compute  the  function  correctly. 

Privacy :  i  wishes  that  the  other  agents  do  not  discover  i’s  private  input. 

Voyeurism :  i  wishes  to  discover  the  private  inputs  of  the  other  agents. 

Of  course,  these  considerations  are  often  conflicting.  They  certainly  conflict  across  agents  -  one 
agent’s  privacy  conflicts  with  another  agent’s  voyeurism.  But  they  also  conflict  within  a  given  agent 
-  the  wish  to  compute  the  function  may  propel  the  agent  to  disclose  his  private  input,  but  his  privacy 
concerns  may  prevent  it.  So  the  question  is  how  to  amalgamate  these  different  considerations  into 
one  coherent  preference  function. 

In  this  paper  we  consider  lexicographic  orderings.  In  the  extended  abstract,  we  analyze  all  24 
possible  orderings  of  these  considerations,  while  in  the  full  paper  we  consider  all  possible  orderings  on 
all  subsets  of  the  considerations.  In  each  case  we  ask  for  which  functions  /  there  exists  a  mechanism 
in  the  sense  of  mechanism  design  [3],  such  that  in  the  game  induced  by  the  mechanism,  it  is  a 
Bayes-Nash  equilibrium  for  the  agents  to  disclose  their  true  inputs.  Of  course,  to  do  that  we  must 
be  explicit  about  the  probability  distribution  from  which  the  agents’  inputs  are  drawn. 

This  is  a  good  point  at  which  to  make  clear  the  connection  between  our  setting  and  the  restricted 
NCC  setting  of  S&T: 

•  S&T  consider  only  correctness  and  exclusivity  (and,  in  particular,  only  the  ordering  in  which 
correctness  precedes  exclusivity). 

•  S&T  consider  both  the  case  in  which  the  inputs  of  the  agents  are  independently  distributed 
and  the  case  in  which  they  are  correlated. 

•  S&T  consider  also  a  version  of  the  setting  in  which  agents  are  willing  to  mis-compute  the 
function  with  a  small  probability,  and  another  version  in  which  agents  can  be  offered  money, 
in  addition  to  their  inherent  informational  incentives. 

We  not  only  consider  privacy  and  voyeurism  in  addition  to  correctness  and  exclusivity,  but  also 
consider  all  24  possible  orderings  among  them  (mercifully,  in  the  Boolean  case  which  we  investigate 
they  collapse  to  four  equivalence  classes),  maintaining  the  property  that  all  agents  have  the  same 
ordering  over  the  considerations.  However,  in  this  paper  we  do  not  investigate  the  case  of  correlated 
values,  nor  the  probabilistic  and  monetary  extensions.  We  leave  those  to  future  work. 

There  is  one  additional  sense  in  which  our  treatment  is  more  general.  Consider  for  example  the 
consideration  of  correctness,  and  three  possible  outcomes:  in  the  first  the  agent  believes  the  correct 
value  with  probability  .6,  in  the  second  with  probability  .99,  and  in  the  third  with  probability  1. 
Holding  all  other  considerations  equal,  how  should  the  agent  rank  these  outcomes?  Clearly  the  third 
is  preferred  to  the  others,  but  what  about  those  two?  Here  we  have  two  versions;  in  one,  the  first 
two  are  equally  desirable  (in  other  words,  any  belief  less  than  certainty  is  of  no  value),  and  in  the 
other  the  second  is  preferred  to  the  first.  We  call  those  two  settings  the  full  information  gain  setting 
and  the  partial  information  gain  setting,  respectively. 

This  means  that  rather  than  24  cases  we  need  to  investigate,  we  have  48.  But  again,  luck  is  on 
our  side,  and  we  will  be  able  to  investigate  a  small  number  of  equivalence  classes  among  these  cases. 

In  the  next  section  we  give  the  precise  definitions,  and  in  the  following  sections  we  summarize 
our  results.  Several  of  the  insights  into  our  results  are  derived  from  the  results  obtained  by  S&T; 
we  will  try  to  indicate  clearly  when  that  is  the  case. 
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2  Formulation 


2.1  Formal  problem  definition 

As  in  NCC,  let  N  =  {1,2, ...  ,n}  be  a  set  of  agents,  and  consider  also  a  non-strategic  center  which 
will  execute  the  protocol.  We  assume  that  each  agent  1 . . .  n  has  a  private  and  authenticated  channel 
between  itself  and  the  center.  Each  agent  has  an  input  V%  drawn  from  the  set  We  will  use  vt 
(as  shorthand  for  Vj  =  vf)  to  represent  a  particular  (but  unspecified)  value  for  V).  The  vector 
v  =  (vi, . . . ,  vn)  consists  of  the  types  of  all  agents,  while  V-i  =  (i>i, . . . , v»-i,  v*+i, . . . ,  vn)  is  this 
same  vector  without  the  type  of  agent  i  (v—ij  simply  extends  this  to  removing  two  types).  P(V) 
is  the  joint  probability  over  all  players’  inputs,  which  induces  a  Pj(Vj)  for  each  agent.  Each  agent 
knows  his  own  type,  but  does  not  know  the  types  of  the  other  agents.  Instead,  the  prior  P(V) 
(which  we  assume  has  full  support  -  that  is,  \/v  P(v)  >  0)  is  common  knowledge  among  the  agents 
and  known  by  the  mechanism  designer.  We  further  assume  that  the  agent  types  are  independent. 

The  commonly-known  function  that  the  agents  are  trying  to  compute  is  denoted  by  /  :  B\  x  . . .  x 
Bn  — >  Bq.  Though  the  setting  makes  sense  in  the  case  of  an  arbitrary  field,  we  restrict  ourselves 
in  this  work  to  the  case  of  Boolean  functions  over  Boolean  inputs  (B  =  Bi  =  {0, 1}).  We  assume 
that  all  agents  are  relevant  to  the  function  in  that  they  have  at  least  some  chance  of  affecting  the 
outcome.  Formally,  this  means  that,  for  each  agent  i,  3 Vi,V-i  f(vi,V-t)  ^  f(~<Vi,V-i). 

2.2  The  mechanism  design  problem 

In  general,  a  mechanism  is  a  protocol  specifying  both  the  space  of  legal  messages  that  the  individual 
agents  can  send  to  the  center  and,  based  on  these  messages,  what  the  center  will  return  to  each 
agent.  We  seek  that,  for  all  possible  input  values,  all  agents  believe  in  the  correct  value  of  the 
function  at  the  end  of  the  protocol.  Since  dominant  strategy  implementation  is  not  feasible  in  our 
setting,  we  are  looking  for  a  Bayes-Nash  implementation. 

A  standard  mechanism  is  a  mapping  from  actions  to  outcomes.  The  setting  of  NCC  is  somewhat 
different  from  the  standard  mechanism  design  setting,  however.  In  the  case  of  NCC,  an  outcome 
is  a  complete  set  of  belief  states,  but  instead  of  mapping  from  actions  to  outcomes,  the  mechanism 
instead  gives  a  signal  to  each  player,  who  interprets  it  according  to  his  belief  strategy.  Thus,  a 
mechanism  cannot  enforce  outcomes:  it  can  only  control  the  information  embedded  in  its  signal  to 
each  player.  As  we  shall  see,  this  will  be  sufficient  for  our  purposes.  A  player’s  preferences  over  his 
and  others’  belief  states  are  defined  with  respect  to  the  correct  inputs  to  and  outputs  of  the  function, 
as  determined  by  the  private  types  of  the  other  players. 

A  priori,  one  could  imagine  arbitrarily  complicated  mechanisms  in  which  the  agents  and  the 
center  iterate  through  many  rounds  of  messages  before  converging  on  a  result.  However,  following 
Shoham  and  Tennenholtz,  we  note  that  an  extended  revelation  principle  (extended  from,  e.g.,  [3]) 
allows  us  wlog  to  restrict  our  attention  to  mechanisms  in  which  the  agents  truthfully  declare  an 
input  to  the  center  and  accept  the  result  returned  to  them. 

Theorem  1  (Extended  Revelation  Principle)  If  there  exists  a  protocol  for  the  center  and  the 
agents  in  which  each  agent  computes  the  correct  value  of  the  function,  then  there  exists  a  truthful, 
direct  protocol  in  which  each  agent  accepts  the  center’s  output  and  thereby  computes  the  correct  value 
of  the  function. 

Formally,  a  mechanism  is  a  tuple  (Si, . . . ,  Sn,  g\, . . . ,  gn),  consisting  of,  for  each  agent  i,  a  strategy 
space  Si  and  a  function  that  determines  the  output  returned  to  the  agent.  A  strategy  of  agent 
i  consists  of  the  following  tuple  of  functions:  (sj  :  B  — >  A B,  b{  :  B  x  B  — >  A B,  b{  :  B  x  B  — > 
A B, ...,  &"  :  B  x  B  — >  A B).  The  first  maps  an  agent  i’s  true  type  to  a  distribution  over  its  declared 
type  (which  we  will  sometimes  refer  to  as  vf).  The  second  maps  V s  true  type  ty  and  the  center’s 
response  cy(u)  to  V s  beliefs  about  the  output  of  the  function  /.  The  remaining  functions  map  i’s 
type  and  the  center’s  response  to  i’s  beliefs  about  each  agent  j’s  private  input.  We  shall  henceforth 
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refer  to  the  tuple  of  belief  functions,  which  together  map  to  a  complete  belief  state  of  agent  i,  as 
bi,  the  agent’s  belief  strategy.  Agents  may  have  other  higher-order  beliefs,  but  we  can  neglect  these 
since  they  are  not  relevant  to  any  agent’s  preferences. 

The  set  of  outcomes  O  is  the  set  of  distributions  over  the  belief  states  of  the  agents  about  the 
input  and  output  values;  that  is,  O  =  (A B  x  A Bn)n.  We  wish  to  implement  the  social  choice 
function  W  which  always  selects  an  outcome  in  which,  for  all  agents  i,  Pr(b{(vi,  gi(v))  =  f(v))  =  1. 
That  is,  in  our  desired  outcome,  each  agent  always  computes  the  correct  value  of  the  function.  In 
this  paper,  we  restrict  the  range  of  g,;  :  Bn  — >  B  so  that  it  returns  to  agent  i  a  bit  (to  represent  a 
possible  output  of  the  function)  for  each  set  of  declared  values  v.  Since  we  wish  every  agent  to  always 
compute  correctly,  we  can  restrict  the  center’s  protocol  to  computing  and  returning  the  function 
f(v)  to  each  player  (i.e.  gi{v)  =  f(v)). 

The  agents’  preferences  are  defined  over  the  belief  states  which  form  the  set  of  outcomes.  We 
now  give  a  more  formal  definition  of  the  incentives  of  each  agent,  first  for  the  full  information  gain 
setting. 

Correctness:  i  wishes  that  Pr(b{(vi,  gi(v))  =  f(v))  =  1. 

Exclusivity:  For  each  j  ^  i,  i  wishes  that  Pr(bj  (vj,gj(v))  =  f(v))  1. 

Privacy:  For  each  j  y^  i,  i  wishes  that  Pr(bj(vj,  gj(v))  =  vf)  y^  1. 

Voyeurism:  For  each  j  ^  i,  i  wishes  that  Pr(b\  (n*,  g%{v))  =  Vj )  =  1. 

In  the  partial  information  gain  setting,  agent  valuations  depend  on  more  than  whether  or  not 
a  probability  is  equal  to  1.  Instead,  agents  attempt  to  maximize  the  entropy  function,  which  for  a 
distribution  Pr(X)  over  a  Boolean  variable  X  is  defined  as:  H(X)  =  —Pr(X  =  0)  •  log2Pr(X  = 
0)  -  Pr(X  =  1)  •  log2Pr(X  =  1). 

Because  of  the  way  in  which  we  can  reduce  the  space  of  mechanisms  that  we  need  to  consider, 
we  can  restate  our  goal  as  follows.  In  this  paper  we  characterize,  for  each  possible  ordering  on  the 
four  incentives  listed  above,  the  set  of  functions  /  for  which  it  is  a  Bayes-Nash  equilibrium  for  each 
agent  i  to  use  a  strategy  (sj(vj)  =  Vi,b{ (i>i,  f(v))  =  f(v),...)  -  that  is,  always  telling  the  truth  and 
believing  the  output  of  the  mechanism. 


3  Full  information  gain  setting 

In  this  section  we  consider  the  full  information  gain  setting,  in  which  we  assume  that  agents  are 
only  concerned  with  what  they  and  the  other  agents  know  with  certainty,  as  opposed  to  what  they 
can  know  with  some  probability.  We  now  characterize  the  set  of  functions  that  are  NCC  for  each  of 
24  possible  orderings  of  the  incentives,  which  can  be  broken  into  four  cases. 

Before  we  begin,  we  review  two  definitions  and  a  theorem  from  S&T  [4]  that  will  play  an  important 
role  in  our  impossibility  results.  We  say  that  /  is  (locally)  dominated  if  there  exists  a  type  for  some 
agent  which  determines  the  output  of  the  function.  Formally,  the  condition  is  that  there  exists  an 
i  and  Uj  such  that  Vv_j,uA,  /(vj,u_j)  =  f(vi,v'_f).  We  say  that  a  function  /  is  reversible  if  there 
exists  an  agent  i  such  that  Vu_,;,  /(V)  =  0,u_i)  yf  /(F)  =  1  ,V-f),  which  means  that  for  either  input, 
agent  i  knows  what  the  value  of  function  would  have  been  if  he  had  submitted  the  other  input. 

Theorem  2  (Shoham  and  Tennenholtz)  When  agents  value  correctness  over  exclusivity,  and 
value  no  other  consideration,  a  function  is  NCC  if  and  only  if  it  is  non-reversible  and  non- dominated. 

We  can  restrict  the  class  of  functions  to  consider  in  the  current  setting  by  noting  that  any  function 
which  is  not  NCC  in  the  S&T  sense  is  not  NCC  for  any  ordering  of  the  four  incentives. 

Theorem  3  Any  function  that  is  reversible  or  dominated  is  not  NCC. 
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3.1  Exclusivity  and  correctness 

We  can  tackle  half  of  the  orderings  at  once  by  considering  the  case  in  which  all  agents  rank  exclusivity 
over  correctness.  Not  surprisingly,  all  is  lost  in  this  case. 

Theorem  4  If  exclusivity  is  ranked  over  correctness,  then  no  function  is  NCC. 

On  the  other  hand,  we  find  that  the  converse  of  Theorem  3  holds  when  correctness  is  ranked 
above  all  other  factors. 

Theorem  5  If  correctness  is  ranked  over  all  other  factors,  then  a  function  is  NCC  if  and  only  if 
it  is  non-reversible  and  non- dominated. 

3.2  Privacy  over  correctness 

We  are  now  down  to  six  cases,  in  which  correctness  must  be  ranked  second  or  third,  and  exclusivity 
must  be  ranked  below  it.  For  the  four  of  these  cases  in  which  privacy  is  ranked  over  correctness, 
the  key  concept  is  what  we  call  a  privacy  violation,  which  occurs  when  an  agent  has  an  input  for 
which  there  is  a  possible  output  that  would  allow  the  agent  to  determine  another  agent’s  input 
with  certainty.  Formally,  we  say  that  a  privacy  violation  for  agent  i  by  agent  j  occurs  whenever 
3 Vj,x,y,\/v-j  (f(vj,v-j)  =  x)  =>  (Vi  =  y). 

Theorem  6  If  privacy  is  ranked  over  correctness,  and  both  are  ranked  over  exclusivity,  then  a 
function  is  NCC  if  and  only  if  it  is  non-reversible,  non- dominated,  and  has  no  privacy  violations. 

It  is  interesting  to  note  the  relationship  between  privacy  violations  and  what  we  call  conditional 
(local)  domination.  We  say  that  agent  i  conditionally  dominates  f  given  agent  j  if  3vi,  Vj,x  (Mv-ij,  f(vi,  Vj,V-ij)  = 
x)  A  (3v—ij  f(->Vi,Vj,V-ij)  ^  x).  Using  the  terminology  we  defined  earlier,  conditional  domination 
occurs  when  agent  j  can  submit  an  input  Vj  such  that  agent  i  both  dominates  and  is  relevant  to  the 
output  of  the  conditional  function  f-j(v-j)  =  f(vj,V-j). 

Lemma  7  ;  There  exists  a  privacy  violation  for  agent  i  by  agent  j  if  and  only  if  agent  i  conditionally 
dominates  f  given  j. 

3.3  Voyeurism  first,  correctness  second 

The  final  two  cases  to  consider  are  those  in  which  the  first  two  considerations  are  voyeurism  and 
correctness,  in  that  order.  If  there  exists  an  agent  j  who  can  obtain  a  greater  amount  of  voyeurism, 
on  expectation,  from  one  of  his  possible  inputs,  then  he  will  always  choose  to  declare  this  input. 

Thus,  a  necessary  condition  for  the  function  to  be  NCC  is  that  the  expected  voyeurism  be  equal  for 
Vj  =  0  and  Vj  =  1.  If  this  is  the  case,  then  correctness  becomes  paramount,  and  we  again  have  the 
classic  NCC  condition. 

Formally,  define  a  new  indicator  function  violate(i,j,Vj,x)  to  be  1  if  a  privacy  violation  occurs 
for  agent  i  by  agent  j,  and  Vj  and  x  satisfy  the  condition  for  the  violation  to  occur,  and  0  otherwise. 

Now,  we  can  formally  give  the  condition  for  a  voyeurism  tie. 

Theorem  8  If  all  agents  rank  voyeurism  first  and  correctness  second,  then,  given  a  prior  P(V),  a 
function  is  NCC  if  and  only  if  it  is  non-reversible  and  non-dominated  and  the  following  condition 
for  a  voyeurism  tie  holds  for  each  agent  j: 

EE  P(v~j) -violate  (*,  j,  Vj  =  1  ,f(Vj  =  M-j))  =  EE  P(v-j)-violate(i,  j,Vj  =  0  ,f(Vj  =0 

v-i  v-j 
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For  the  common  prior  P(0)  =  an  example  of  a  function  for  which  there  is  a  voyeurism  tie  in 
the  presence  of  privacy  violations  is  the  unanimity  function,  in  which  f(v)  =  1  if  and  only  if  the 
inputs  of  all  agents  are  identical. 

Finally,  note  that  the  space  of  functions  which  are  NCC  in  these  two  cases  is  a  superset  of  the 
functions  which  are  NCC  in  the  cases  of  the  previous  subsection  ( privacy  over  correctness) ,  since  a 
complete  lack  of  privacy  violations  trivially  induces  a  voyeurism  tie. 

4  Partial  information  gain  setting 

Now  we  consider  the  partial  information  gain  setting,  in  which  agents  value  increased  information 
about  a  factor.  For  this  setting,  we  see  that  the  results  are  unchanged  for  many  of  the  possible 
lexicographic  orderings,  but  are  different  for  several  interesting  cases. 

4.1  Unchanged  results 

The  first  three  theorems  from  the  full  information  gain  setting  carry  over  exactly  to  this  setting. 

Theorem  9  In  the  partial  information  gain  setting,  any  function  that  is  reversible  or  dominated  is 
not  NCC. 

Theorem  10  In  the  partial  information  gain  setting,  if  agents  rank  exclusivity  over  all  other  fac¬ 
tors,  then  no  function  is  NCC. 

Theorem  11  In  the  partial  information  gain  setting,  if  agents  rank  correctness  over  all  other  fac¬ 
tors,  then  a  function  is  NCC  if  and  only  if  it  is  non-reversible  and  non- dominated. 

4.2  Privacy  over  correctness 

For  the  cases  in  which  privacy  is  ranked  above  correctness,  the  condition  for  non-cooperative  com¬ 
putability  becomes  more  stringent,  and  it  now  depends  on  the  form  of  the  prior. 

First,  we  need  to  update  the  definition  of  a  privacy  violation,  because  it  now  occurs  whenever  an 
agent’s  posterior  distribution  over  another  agent’s  input  differs  at  all  from  the  prior  distribution.  We 
say  that  a  partial  privacy  violation  of  agent  i  by  agent  j  occurs  if  3 Vi,  x,  Pr(Vj\vi ,  f(v)  =  x)  P(Vj). 

Theorem  12  In  the  partial  information  gain  setting,  if  agents  rank  privacy  over  correctness,  then, 
given  a  prior  P(V),  a  function  is  NCC  if  and  only  if  it  is  non-reversible  and  non- dominated  and 
there  are  no  partial  privacy  violations. 

The  absence  of  partial  privacy  violations  can  also  be  formulated  by  the  following  condition, 
which,  in  words,  requires  that  no  pair  of  inputs  provide  more  information  about  the  output  than 
any  other  pair. 

Lemma  13  A  function  has  no  partial  privacy  violations  if  and  only  if  satisfies  the  following  condi¬ 
tion: 

3c,  Mi,j,Vi,Vj ,  Pr(f(v)  =  0\vitVj)  =  c 

A  (relatively)  simple  function  that  satisfies  this  condition  is:  f(y)  =  parity{v i,  U2,  v^)f\parity(v^,  i>5,  Vq), 
with  a  common  prior  of  P(0)  =  There  also  exist  privacy-preserving  functions  that  treat  each 
agent’s  input  symmetrically.  One  example,  for  N  =  7  and  the  common  prior  P(0)  =  1,  is  the 
function  f(v)  that  returns  1  if  and  only  if  the  number  of  agents  i  for  which  Vi  =  0  is  1,  2,  4,  or  5. 
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4.3  Voyeurism  first,  correctness  second 


The  last  two  cases  to  consider  are  those  in  which  agents  rank  voyeurism  first  and  correctness  second, 
leaving  privacy  as  their  third  or  fourth  priority. 

In  order  to  calculate  the  expected  entropy  that  agent  j  has  agent  i’ s  input  after  learning  the  out¬ 
put  of  the  function,  we  can  use  the  following  expression:  Pr(vi\f(v)  =  x,  Vj)  =  P'  ■ 

For  the  desired  equilibrium  to  hold,  it  must  be  the  case  that  for  each  agent  i  the  expected  partial 
voyeurism  is  the  same  for  both  possible  values  of  V). 


Theorem  14  In  the  partial  information  gain  setting,  if  agents  rank  voyeurism  first  and  correctness 
second,  then,  given  a  prior  P(V),  a  function  is  NCC  if  and  only  if  it  is  non-reversible  and  non- 
dominated  and  the  following  condition  holds  for  each  agent  j: 


Y/Ex[H(K\vj  =  1  ,f(v)  =  x)}  / n;  r,  =  o,/(u)  =  *)] 

Note  that,  for  the  common  prior  P( 0)  =  the  unanimity  function  still  induces  a  voyeurism  tie, 
as  it  did  for  the  same  two  orderings  of  the  full  information  gain  setting. 


5  Conclusion 

In  this  paper,  we  have  considered  a  class  of  incentive  structures  for  agents  and  a  class  of  mechanisms, 
and  characterized  the  sets  of  functions  which  are  computable  by  agents  which  are  similarly  self- 
interested.  A  summary  of  our  results  lends  itself  to  a  decision  tree,  as  shown  in  Figure  1. 

We  view  these  results  as  laying  the  groundwork  for  a  consideration  of  a  wide  variety  of  both 
theoretical  concerns  and  practical  problems.  In  the  introduction,  we  discussed  the  cryptographic 
problem  of  secure  function  evaluation.  Determining  whether  these  cryptographic  protocols  will  lead 
to  successful  computation  requires  considering  not  only  deviations  from  the  protocol  given  agent 
inputs  (which  is  the  extent  of  the  analysis  in  most  papers  in  this  field) ,  but  also  whether  the  protocol 
is  incentive  compatible.  Using  the  impossibility  results  stated  above,  we  can  focus  our  efforts  on 
designing  protocols  for  functions  which  are  non-cooperatively  computable. 

In  addition,  we  can  extend  our  formulation  along  several  dimensions.  For  example,  if  we  allow 
the  mechanism  to  return  to  an  agent  the  inputs  of  other  agents,  in  addition  to  the  output  of  the 
function,  then  voyeurism  no  longer  prevents  a  function  from  being  NCC.  Intuitively,  the  mechanism 
will  expose  inputs  in  a  way  that  always  creates  a  voyeurism  tie.  While  similar  solutions  cannot 
overcome  privacy  or  exclusivity,  other  extensions,  including  correlation  between  agent  inputs  and 
the  possibility  of  monetary  payments,  further  expand  the  space  of  functions  that  are  NCC.  Finally, 
the  analysis  and  types  of  results  we  obtained  are  not  limited  to  Boolean  functions  and  lexicographic 
utility  functions,  and  we  regard  extending  this  analysis  to  more  general  fields  as  a  promising  line  of 
future  research. 
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Figure  1:  A  decision  tree  which  summarizes  the  conditions  for  a  function  and  prior  to  be  NCC.  The 
conditions  are  the  same  for  the  two  settings  we  consider,  except  for  the  bottom  two  decision  boxes, 
in  which  “(partial)”  refers  to  the  updated  conditions  for  the  partial  information  gain  setting. 
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Abstract 

We  consider  how  much  in  uence  a  center  can  exert  on  a  game 
if  its  only  power  is  to  propose  contracts  to  the  agents  before 
the  original  game,  and  enforce  the  contracts  after  the  game 
if  all  agents  sign  it.  Modelling  the  situation  as  an  extensive- 
form  game,  we  note  that  the  outcomes  that  are  enforceable  are 
precisely  those  in  which  the  payoff  to  each  agent  is  higher 
than  its  payoff  in  at  least  one  of  the  Nash  equilibria  of  the 
original  game.  We  then  show  that  these  outcomes  can  still  be 
achieved  without  any  effort  actually  expended  by  the  center: 

We  propose  a  mechanism  in  which  the  center  does  not  moni¬ 
tor  the  game,  and  the  contracts  are  written  so  that  in  equilib¬ 
rium  all  agents  sign  and  obey  the  contract,  with  no  need  for 
center  intervention. 

Introduction 

There  has  been  much  interest  in  A1  in  mechanism  design,  the 
area  of  game  theory  devoted  to  designing  protocols  for  self- 
interested  agents.  In  the  literature  (Mas-Colell,  Whinston, 
&  Green  1995)  it  is  generally  assumed  that  the  mechanism 
designer  has  complete  freedom  in  designing  the  rules  of  the 
game.  Yet  the  world  is  full  of  strategic  situations  with  rules 
that  already  exist  and  cannot  be  changed  arbitrarily.  Recent 
work  on  k-implementation  (Monderer  &  Tennenholtz  2003) 
restricts  the  capabilities  of  the  mechanism  designer  in  a  par¬ 
ticular  way  -  it  can  add  to  any  given  cell  in  the  payoff  ma¬ 
trix,  but  it  cannot  subtract.  (The  interesting  results  in  that 
line  of  work  concern  cases  in  which,  despite  that  addition, 
the  cost  to  the  center  in  equilibrium  is  zero.)  The  opposite 
of  this  setting  would  be  one  in  which  the  center  can  impose 
nes,  rather  than  bonuses.  This  in  and  of  itself  is  not  in¬ 
teresting,  because  with  suf  ci  ently  large  nes  any  outcome 
can  be  enforced.  However,  suppose  the  mechanism  cannot 
unilaterally  impose  nes,  but  it  can  do  so  in  the  context  of 
a  signed  contract.  Speci  cally  ,  we  consider  the  following 
class  of  mechanisms.  Given  a  game  G,  the  center  can: 

1 .  Propose  a  contract  before  G  is  played.  This  contract  spec- 
i  es  a  particular  outcome,  that  is,  a  unique  action  for  each 
agent,  and  a  penalty  for  deviating  from  it. 

*This  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  ITR  IIS-0205633  and  by  This  work  was  supported  in 
part  by  the  DARPA  grant  F30602-00-2-0598. 
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2.  Collect  signatures  on  the  contract  and  make  it  common 

knowledge  who  signed. 

3.  Monitor  the  players’  actions  during  the  execution  of  G. 

4.  If  the  contract  was  signed  by  all  agents,  ne  anyone  who 

deviated  from  it  as  speci  ed  by  the  contract. 

Our  setting  is  reminiscent  of  the  work  on  social  laws  and 
conventions  (Shoham  &  Tennenholtz  1997).  There  too  the 
center  can  offer  a  social  convention  to  the  players,  where 
each  player  agrees  to  a  particular  outcome  so  long  as  the 
other  players  play  their  part.  The  difference  is  that  in  that 
work  it  is  assumed  that,  once  all  agents  agree,  the  center  has 
the  power  to  enforce  that  outcome.  Here  we  assume  that 
players  still  have  the  freedom  to  choose  whether  or  not  to 
honor  the  agreement;  the  challenge  is  to  design  a  mechanism 
such  that,  in  equilibrium,  they  will. 

The  technical  results  of  this  paper  will  refer  to  games  of 
complete  information,  but  for  intuition  consider  the  example 
of  online  auctions,  such  as  those  conducted  by  eBay.  Con¬ 
sider  the  complete  game  being  played,  including  the  deci¬ 
sion  after  the  close  of  the  auction  by  the  seller  of  whether 
to  deliver  the  good  and  by  the  buyer  of  whether  to  send 
payment.  Straightforward  analysis  shows  that  that  the  equi¬ 
librium  is  for  neither  to  keep  his  promise,  and  the  experi¬ 
ence  with  fraud  on  eBay  (Hafner  2004)  demonstrates  that 
the  problem  is  not  merely  theoretical.  It  would  be  in  eBay’s 
interest  to  nd  a  way  to  enable  its  customers  to  bind  them¬ 
selves  to  their  promises. 

Our  rst  interest  will  be  to  characterize  the  achievable  out¬ 
comes:  What  outcomes  may  the  center  suggest,  with  asso¬ 
ciated  penalties,  that  the  agents  will  accept?  Our  rst  result 
will  be  an  observation  that  the  center’s  power  is  quite  broad: 
Any  outcome  will  be  accepted  when  accompanied  by  ap¬ 
propriate  nes,  so  long  as  the  payoffs  of  each  agent  in  that 
outcome  are  better  than  that  player’s  payoffs  in  some  equi¬ 
librium  of  the  original  game. 

Although  the  center  can  achieve  almost  any  outcome,  we 
note  that  the  helpful  center  expends  a  large  amount  of  ef¬ 
fort  to  do  so:  suggesting  an  outcome,  collecting  signatures, 
observing  the  game,  and  enforcing  the  contracts.  If  this  pro¬ 
cedure  happens  not  just  for  one  game,  but  for  hundreds  or 
thousands  per  day,  the  center  may  wish  to  nd  a  way  to  avoid 
this  burden  while  still  achieving  the  same  effect. 
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The  bulk  of  this  paper  concerns  ways  in  which  this  reduc¬ 
tion  in  effort  can  be  achieved.  We  continue  to  assume  that 
the  center  still  needs  to  propose  a  contract.  We  also  simply 
assume  that  it  does  not  monitor  the  game.  Nor  does  it  partic¬ 
ipate  in  the  signing  phase;  the  agents  do  that  among  them¬ 
selves  using  a  broadcast  channel.  While  we  might  imagine 
that  the  players  could  simply  broadcast  their  signatures,  this 
protocol  allows  a  single  player  to  leam  the  others’  signatures 
and  threaten  them  with  nes.  Nonetheless,  we  can  construct 
a  more  complicated  protocol  -  using  a  second  stage  of  con¬ 
tracts  -  which  does  not  require  the  center’s  participation.  The 
only  phase  in  which  the  center’s  protocol  requires  it  to  get 
involved  under  some  conditions  is  the  enforcement  stage. 
However,  our  goal  will  be  to  devise  contracts  so  that,  in  equi¬ 
librium,  at  this  stage  too  the  center  sits  idle.  Our  results  here 
will  be  as  follows.  If  the  game  play  is  veri  able  (if  the  cen¬ 
ter  can  discover  after  the  fact  how  the  game  played  out),  we 
can  achieve  all  of  the  outcomes  achievable  by  a  fully  en¬ 
gaged  center.  If  the  game  is  not  veri  able  then  we  can  still 
achieve  all  previously  achievable  outcomes  with  some  con¬ 
tract,  but  that  contract  might  allow  equilibria  with  additional 
outcomes. 

The  rest  of  the  paper  is  organized  as  follows:  we  rst  for¬ 
mally  de  ne  our  setting.  Then  we  characterize  the  set  of 
outcomes  which  are  achievable  with  a  busy  center  using  con¬ 
tracts  in  this  game.  Finally,  we  lighten  the  load  on  the  center 
rst  in  the  enforcement  stage,  then  in  the  signature  exchange 
stage. 

Formal  Setting 

The  strategic  situation  the  center  wishes  to  in  uence  can  be 
characterized  as  a  strategic-form  game  with  consequences  in 
O:  G  =  ( N ,  A,  O,  g,  V).  (We  roughly  follow  the  notation  of 
(Osborne  &  Rubinstein  1994).)  Here  N  is  the  set  of  players 
{l...n}.  A  =  Ai  x  Ai  x  ...  x  An,  where  A,  is  the  set 
of  actions  which  can  be  taken  by  an  individual  agent.  We 
will  use  ai  to  refer  to  an  action  of  i  in  G  and  a_j  to  refer 
to  the  vector  of  actions  of  all  other  players.  O  is  the  space 
of  outcomes;  g  :  A  — >  O  determines  the  outcome  after  an 
action  pro  le.  We  identify  each  outcome  0(a.jO_.)  with  a 
distinct  action  prole  (aj,a_j),  and  assume  g(ai,a-i)  = 
°(ai,a_i)-  V  =  Vi  x  Vi  x  ...  x  Vn,  where  1R.  is  the 

pay-off  function  for  player  i. 

Before  this  strategic  situation  G  occurs,  the  helpful  center 
suggests  a  contract  to  the  players.  This  contract  speci  es  the 
outcome  o  suggested  by  the  center  and  what  actions  h  the 
center  will  take  in  response  to  different  action  pro  les  of  the 
players.  The  center  will  not  enforce  this  contract  unless  it 
is  signed  by  all  players.  This  contract  de  nes  the  center’s 
protocol  in  the  enforcement  stage  H,  as  described  below. 
We  will  denote  a  contract  that  describes  a  particular  center 
protocol  h  as  C/t. 

Now  we  will  describe  the  stages  of  the  game,  as  initially 
formulated.  We  will  adjust  this  formulation  in  later  sections 
so  that  the  center  does  no  work  in  equilibrium. 

Signature  Exchange  Stage  F  Each  player  who  assents 

sends  his  signature  on  the  contract  to  the  center,  who  col¬ 
lects  them.  The  center  noti  es  all  players  of  the  identi¬ 


ties  of  the  signers.  At  the  end  of  this  stage,  it  is  common 
knowledge  whether  or  not  the  contract  will  be  enforced. 

Execution  Stage  G  The  players  play  the  game  G.  Each 
player  may  take  his  action  a,;  to  achieve  o  or  he  may  not. 
The  center  observes  the  actions  taken  by  the  players. 

Enforcement  Stage  H  The  center  takes  the  actions  speci- 
ed  in  the  contract  in  response  to  the  actions  he  observed. 

The  outcomes  are  a  consequence  of  the  execution  stage, 
but  the  only  way  the  center  can  affect  the  players’  actions  in 
the  is  by  ning  them  in  the  enforcement  stage. 

The  extended  game  which  arises  from  playing  the  stage 
games  one  after  another  we  denote  by  X  =  F  ■  G  ■  H. 
Together,  these  de  ne  an  extensive-form  game  with  simul¬ 
taneous  moves.  In  general,  an  extensive-form  game  X  can 
be  de  ned  as  X  =  (TV,  A^,  P,  U),  where  N  is  again  the 
players,  fi  is  the  set  of  histories  of  actions  taken,  Au  is  the 
set  of  actions  for  all  players  that  can  be  taken  after  history 
to,  P  :  H  — >  2a  is  the  player  function  that  de  nes  which 
players  get  to  move  after  a  given  history,  and  U  is  the  utility 
function  of  players  in  the  entire  game. 

In  our  particular  setting,  the  history  is  just  the  set  of 
actions  taken  in  each  stage  game  played  so  far,  Au  is  the 
set  of  actions  possible  in  each  stage  game  following  history 
10,  P  is  the  set  of  all  players  (all  players  move  simultane¬ 
ously  in  each  stage),  and  U  is  the  (undiscounted)  sum  of 
the  utilities  of  each  stage  game.  We  denote  the  subgame  of 
X  =  (N,  f2|w,  A\u,  P\u,  U\J)  that  arises  after  history  u>  by 
T(w),  which  simply  refers  to  the  play  of  the  remaining  stage 
games  following  the  actions  taken  in  io.  In  later  sections,  we 
will  refer  to  the  strategy  space  of  stage  F  as  TF,  of  stage  G 
as  TG,  and  of  stage  FI  as  TH . 

A  pure  strategy’  <Ji  for  player  i  in  a  strategic-form  game 
corresponds  to  the  choice  of  a  single  action  cr,;  G  A,  .  A 
mixed  strategy  corresponds  to  the  choice  of  a  distribution 
over  actions:  at  G  A .4,.  A  pure  strategy  in  an  extensive 
form  game  is  de  ned  as  cr*  :  fi  — >  A^;  a  mixed  strategy  is 
de  ned  accordingly.  If  at  is  a  strategy  in  X ,  then  the  strat¬ 
egy  fjj|w  :  fi|w  — >  A\u  induced  by  a,  in  the  subgame  T(w) 
is  <7i|w(u/)  =  cri((0,Lo').  Since  it  is  unobservable  whether  a 
player  has  played  a  particular  mixed  strategy  (only  the  re¬ 
alization  of  that  strategy  is  observed),  we  will  henceforth 
concentrate  on  enforcing  outcomes  that  are  the  consequence 
of  pure  strategy  pro  les. 

Our  chosen  solution  concept  will  be  subgame  perfect 
equilibrium.  To  de  ne  this,  we  must  r  st  de  n  e  a  Nash  equi¬ 
librium:  a  pro  le  of  strategies  cr  is  a  Nash  equilibrium  in  a 
game  if  Vi  G  N,a[  G  A  At  :  Ufa^a^)  > 

A  pro  le  of  strategies  a  in  an  extensive  form  game  is  a 
subgame  perfect  equilibrium  if  for  every  w  G  H,  <J\W  is  a 
Nash  equilibrium  of  the  subgame  T(w).  A  subgame  perfect 
equilibrium  is  resistant  to  deviations  by  players  even  in  sub¬ 
games  off  the  equilibrium  path. 

The  Power  of  a  Helpful  Center 

We  wish  to  characterize  the  power  of  a  helpful  center  with¬ 
out  any  resource  limitations.  In  this  section,  the  center 
is  limited  only  by  the  voluntary  consent  required  from  all 
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agents  and  by  its  lack  of  desire  to  spend  its  own  money. 
Speci  cally  ,  we  assume  that  it  collects  the  signatures  in  F 
itself  and  that  it  monitors  the  players’  actions  in  G.  In  later 
sections  we  will  relax  each  of  these  two  assumptions. 

First,  we  must  precisely  de  ne  the  game  which  is  being 
played.  We  model  the  signature  exchange  stage  as  a  game 
form  F  with  players  N  and  action  space  I’-*1’  =  {0,  l}n.  The 
player  i  assents  to  the  contract  if  ■'/['  £  Tf  =  1.  Since  the 
center  broadcasts  the  identities  of  the  signers,  each  player’s 
action  is  common  knowledge.  The  execution  game  G  thus 
has  an  extended  action  space  T&  :  {0, 1}"  —>  A  in  which 
players  decide  to  take  action  based  on  the  consequences  of 
the  signature  exchange  stage.  In  the  initial  formulation,  the 
enforcement  stage  H  requires  no  action  on  the  part  of  the 
players,  but  only  of  the  center.  The  center’s  protocol  h  sets 
the  payoff  function  of  the  enforcement  stage.  The  center 
observes  the  signatures  it  receives  and  the  actions  chosen  by 
the  players  and  chooses  to  ne  or  reward  players.  Formally, 
h  =  hi  x  h.2  x  ...  x  hn  and  hi  :  {0, 1}"  xO-^1. 

We  de  ne  the  payoff  function  Ut  :  T  — >  K.  for  each 
player  in  the  extended  game  X  given  actions  v  £  {0, 1}" 
and  (a,i,a-i)  £  A  as  Ui(v,  a*,  a_j)  =  V(g(ai,a-i))  + 
hi(v,  g(a,i,  a-i )).  Thus  each  player  has  a  quasi-linear  utility 
function  over  the  outcome  determined  in  G  and  the  money 
taken  or  given  by  the  center  according  to  h. 

We  say  that  the  center’s  protocol  h  is  voluntary  if  the  cen¬ 
ter  neither  n  es  nor  rewards  players  if  the  contract  is  not 
signed  by  every  player:  for  all  v  f  ln  £  {0, 1}”  and  for  all 
o  £  O,  it  is  the  case  that  hi(v ,  o)  =  0.  We  say  that  h  is  frugal 
if  the  center  never  spends  its  own  money:  for  all  v  £  {0, 1}" 
and  all  o  £  O,  it  is  the  case  that  J2ieN  hi{v,o)  <  0.  As 
these  capture  the  limitations  on  the  helpful  center  in  our  set¬ 
ting,  we  will  henceforth  limit  h  to  be  frugal  and  voluntary. 

We  rst  wish  to  characterize  what  outcomes  can  occur  in  a 
subgame-perfect  equilibrium  of  the  extended  game  X.  The 
outcome  depends  on  two  things:  the  contract  ch  suggested 
by  the  center  and  the  strategies  of  the  players  in  X.  We  wish 
to  nd  contracts  to  which  the  players  will  agree  that  ensure 
that  our  chosen  outcome  is  played. 

In  order  to  characterize  the  space  of  possible  outcomes 
which  can  be  enforced,  we  must  de  ne  the  notion  of  a  pun¬ 
ishment  equilibrium.  pl  is  a  punishment  equilibrium  for  i  if 
p 1  is  the  Nash  equilibrium  of  G  with  minimal  payoffs  for  i 
among  all  (mixed)  Nash  equilibria  of  G. 

Theorem  1  Let  pl  be  the  punishment  equilibrium  for  i.  For 
all  0(aia_ip  ifVfai ,  a-i)  >  Vfp1),  then  there  exists  a  vol¬ 
untary’  and  frugal  center  protocol  h  and  a  subgame  perfect 
equilibrium  n*  in  which  all  players  agree  to  Ch  and  play 
( a* ,  a-i),  and  in  no  subgame  perfect  equilibrium  do  players 
agree  to  Ch  and  then  fail  to  play  ( a* ,  a-i).  Furthermore,  for 
alii,  Ufir*)  =  Vj(aj,a_j).  IfVi(ai,a-i)  <  Vfp1),  then 
there  is  no  subgame  perfect  equilibrium  in  which  (at,  a-i) 
is  played. 

Proof:  First,  suppose  <  Vfp1).  Since  player 

i  will  get  at  least  Vfp1)  in  any  subgame  perfect  equilib¬ 
rium  without  nes,  i  can  pro  t  by  withholding  his  assent. 
As  (at, a-i)  cannot  be  a  Nash  equilibrium  by  assumption 
and  no  nes  are  assessed  in  H,  there  can  be  no  subgame 


perfect  equilibrium  in  which  (a*,  a-i)  is  played. 

Second,  suppose  Vj(aj,a_j)  >  Vfp1).  We  choose 
hfai)  =  0  and  hi(a[  f  af  =  —M.  If  we  choose 
M  so  that  for  all  i,  a[,  and  a!_i,  it  is  the  case  that  M  > 
Vi(a'i,  aifj—Vfai,  a-i),  then  (a*,  a-i)  will  be  the  only  sub¬ 
game  perfect  equilibrium  of  the  subgame  G  ■  H,  supposing 
all  players  agree  to  Ch-  We  also  require  that  all  players  assent 
in  F.  If  any  player  does  not  assent,  all  players  coordinate  on 
his  punishment  equilibrium  pl  in  G.  If  more  than  one  player 
fails  to  assent,  we  break  ties  arbitrarily  to  see  which  pl  is 
played.  No  matter  which  player  fails  to  assent,  pl  will  be  a 
subgame  perfect  equilibrium  of  G  ■  H,  since  the  center  will 
not  assess  nes.  Thus  i  will  not  pro  t  by  withholding  his 
assent.  q 

Thus  we  show  that,  with  a  fully  engaged  center  that  takes 
part  in  the  protocol  and  monitors  the  players’  actions,  we 
can  achieve  any  payoffs  for  the  players  which  are  at  least 
as  good  for  every  player  as  some  Nash  equilibrium  of  G. 
Furthermore,  once  a  contract  for  o  is  mutually  signed,  the 
unique  subgame  perfect  equilibrium  achieves  o. 

We  notice  that,  already,  the  center  takes  no  action  in  H 
in  equilibrium.  Yet  as  the  center  takes  action  in  every  other 
stage,  we  shall  consider  how  to  lighten  the  load  on  the  center. 

Removing  the  Center  From  the  Enforcement 
Stage 

In  this  section,  we  will  drop  the  assumption  that  the  center 
does  not  monitor  the  players’  actions  in  the  execution  stage 
G.  Instead,  we  assume  that  actions  and  outcomes  are  com¬ 
mon  knowledge  among  the  players  but  are  not  observed  by 
the  center.  The  center  must  therefore  encourage  the  players 
to  tell  him  if  there  has  been  a  deviation.  We  will  distinguish 
two  cases.  In  the  veri  able  case,  the  center  can  verify  that 
a  particular  player  played  a  given  action  if  he  chooses  to 
do  so  once  the  game  G  has  been  played.  Speci  cally  ,  we 
require  that  the  center  be  able  to  verify,  for  each  player  i, 
whether  i  played  the  correct  action  a,  or  some  other  action 
a'i  f  ai.  The  center  saves  effort  by  not  paying  attention  to 
G;  we  merely  require  that  he  can  determine  the  truth  after 
the  fact,  if  necessary.  In  the  unveri  able  case,  the  center  has 
no  information  about  players’  actions  whatsoever. 

Because  we  now  require  the  center  to  be  noti  ed  by  the 
players  of  deviations,  the  enforcement  games  we  now  con¬ 
sider  will  be  of  the  following  form:  rst,  the  players  observe 
the  outcome  and  send  messages  to  the  center.  The  center 
publishes  any  messages  he  receives  to  all  players.  The  play¬ 
ers  then  have  the  chance  to  respond  to  the  center’s  messages. 
This  repeats  for  some  number  of  rounds.  Finally,  the  center 
makes  monetary  transfers  between  the  players  based  on  the 
messages  sent. 

For  our  purposes,  this  full  generality  is  not  needed.  Our 
enforcement  stage  H  is  a  single-round  stage  game  where 
each  player  chooses  whether  or  not  to  complain  about 
other  players  by  sending  their  names  to  the  center,  and 
the  center  chooses  a  ne  to  impose  on  each  player:  H  = 
(N,TH ,Rn,h).  yf  £  :  O  — >  2N  species  which 

complaints  player  i  will  send  to  the  center  after  each  out¬ 
come.  As  before,  h  is  the  center’s  protocol  which  maps  out- 
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comes  and  complaints  received  to  monetary  consequences 
in  Mn.  The  center  may  make  payments  based  on  the  out¬ 
come  (if  he  can  verify  it),  the  identities  of  the  complainers, 
and  the  target  of  their  complaints.  In  the  veri  able  case, 
h  :  O  x  ( 2N)n  — >  R",  while  in  the  unveri  able  case, 
h  :  (2N)n  -►  R". 

Now  that  we  have  speci  ed  an  enforcement  game,  we 
wish  to  characterize  the  set  of  outcomes  obtainable  thereby 
in  the  extended  game  corresponding  to  this  enforcement 
game. 

We  de  ne  a  protocol  ha  for  the  center,  which  will  induce 
an  equilibrium  under  which  the  center  takes  no  action  in 
the  enforcement  stage.  Let  M  and  m  be  a  large  and  small 
amount  of  money,  respectively.  In  ha,  the  center  punishes 
each  player  who  deviated  by  a  large-enough  amount  M,  but 
also  rewards  each  player  who  sent  in  a  correct  complaint  by 
to  for  each  correct  complaint.  The  center  also  punishes  any 
player  who  sent  in  an  incorrect  complaint  by  to.  The  con¬ 
tract  that  speci  es  center  protocol  ha  we  call  Cha  ■ 

Theorem  2  (Contracts  for  Veri  able  Games)  Let  G  be  a 

game  with  veri  able  consequences  in  O  and  let  0(aijO  i)  £ 
O  be  the  desired  outcome.  Assume  that  the  center  has  sug¬ 
gested  contract  Cha  de  ned  above  and  consider  the  subgame 
G  ■  H  that  follows  unanimous  agreement  to  this  contract. 
Then  there  is  a  strategy  pro  le  tt*  such  that  tt*  is  the  unique 
subgame  perfect  equilibrium  of  G  ■  H,  0(a.  a_.)  is  the  equi¬ 
librium  outcome  of  tt*,  and  tt*  has  payoffs  V ( a* ,  a_j).  The 
center  takes  no  action  if  tt*  is  played. 

[Proof  omitted.] 

We  now  consider  the  unveri  able  case.  As  before,  we  rst 
de  ne  a  particular  center  protocol  h'a .  In  h'a,  the  center  pun¬ 
ishes  the  target  of  each  complaint  by  a  large-enough  amount 
M,  but  does  not  reward  or  punish  players  for  complaints. 
After  all,  the  center  cannot  distinguish  valid  complaints  from 
invalid  ones.  The  contract  that  speci  es  center  protocol  h!0 
we  call  Ch'o  ■ 

Theorem  3  (Contracts  for  Unveri  able  Games)  Let  G  be 

a  game  with  unveri  able  consequences  in  O,  and  let 
G  O  be  the  desired  outcome.  Assume  that  the  cen¬ 
ter  has  suggested  contract  Ch>o  and  consider  the  subgame 
G  ■  H  that  follows  unanimous  agreement  to  this  contract. 
Then  there  is  a  strategy  pro  le  tt*'  such  that  7r*'  is  a  sub¬ 
game  perfect  equilibrium  of  G  ■  H,  o^a.  a_.')  is  the  equilib¬ 
rium  outcome  of  tt*',  and  7 x*'  has  payoffs  V(a,i,a-i).  The 
center  takes  no  action  if  7 x*'  is  played. 

[Proof  omitted.] 

We  have  seen  that  even  without  veri  ability  ,  it  is  possi¬ 
ble  to  achieve  almost  any  outcome  in  equilibrium.  Unfor¬ 
tunately,  these  equilibria  are  no  longer  unique.  As  we  shall 
see,  in  the  unveri  able  case,  a  given  signed  contract  may 
have  many  possible  equilibrium  outcomes  rather  than  just 
the  intended  one. 

Given  a  game  G,  de  ne  the  shortfall  sf  of  pure-strategy 
prole  a  =  ( cii,a—i )  for  *  as  sf  =  maxa/  Vfa^a-f)  — 
ci-i).  The  shortfall  of  i  in  a  is  the  amount  z’s  pay¬ 
offs  would  need  to  rise  so  that  i  would  have  no  incentive 
to  deviate  from  a ,  all  else  held  constant.  We  can  see  that 


there  must  be  some  equilibrium  of  the  enforcement  game 
in  which  an  agent  i  is  punished  by  at  least  s(a*>a-*)  when¬ 
ever  he  deviates  from  his  action  a, .  Yet,  in  an  unveri  able 
game,  there  is  nothing  in  the  center’s  protocol  which  makes 
(a,;,a_,)  special.  The  players  could  just  as  well  coordinate 
on  this  equilibrium  in  the  enforcement  game  when  the  ac¬ 
tions  are  not  some  other  action  (a',  a_;).  This  implies  that 
any  enforcement  scheme  for  the  unveri  able  case  will  not  in 
general  have  a  unique  outcome.  Here  we  consider  not  only 
our  chosen  center  protocol  h'0,  but  in  fact  any  center  protocol 
h  in  any  form  of  enforcement  game  H. 

Theorem  4  (Spurious  Equilibria)  Consider  an  unveri  - 
able  enforcement  game  with  a  frugal  and  voluntary >  h  under 
which  G  ■  H  has  a  subgame-perfect  equilibrium  tt  in  which 
the  center  does  no  work,  where  (at,  a-f)  is  the  strategy  pro¬ 
le  that  tt  plays  in  G.  Then  if  o'  is  a  pure  strategy  pro  le 
of  G  and'ii  :  sf  <  then  there  exists  a  subgame 

perfect  equilibrium  tt'  of  G  ■  H  such  that  o'  is  the  strategy 
pro  le  that  tt'  plays  in  G. 

[Proof  omitted.] 

A  consequence  of  this  theorem  is  that  any  Nash  equilib¬ 
rium  of  G  can  be  played  in  FGH  regardless  of  the  contract 
signed. 

Corollary  5  (No  Deletion)  If  o  is  a  pure  or  mixed  Nash 
equilibrium  in  the  unveri  able  game  G,  then,  for  any  fru¬ 
gal  and  voluntary  center  protocol  h  that  has  a  subgame  per¬ 
fect  equilibrium  tt  where  the  center  does  no  work,  there  is  a 
subgame  perfect  equilibrium  tt'  of  G  ■  H  such  that  o  is  the 
strategy  pro  le  that  tt'  plays  in  G. 

[Proof  omitted.] 

Thus,  if  the  center  cannot  verify  the  players’  actions,  he 
cannot  in  general  enforce  any  outcomes  uniquely.  After 
signing  the  contracts,  the  players  might  arrive  at  an  outcome 
different  from  the  one  the  center  suggested.  In  a  real-world 
setting,  this  would  substantially  weaken  the  case  that  the 
players  should  sign  the  contract. 

We  have  shown  that  a  helpful  center  who  neither  moni¬ 
tors  the  player’s  actions  nor  nes  any  player  in  equilibrium 
can  enforce  every  outcome  that  a  fully  engaged  center  can 
enforce  with  more  burdensome  contracts.  In  an  unveri  able 
game,  however,  the  center  must  generally  accept  spurious 
equilibria.  Our  next  task  is  to  remove  the  center  from  the 
signature  exchange  stage. 

Exchanging  Signatures  Without  The  Center 

Under  the  original  contract,  the  center  collected  signatures 
on  the  contract  c/,  and  enforced  the  contract  if  every  player 
signed.  We  now  show  how  the  players  can  exchange  signa¬ 
tures  on  the  contract  by  use  of  a  broadcast  channel  without 
requiring  any  action  from  the  center  in  equilibrium.  In  this, 
our  goal  is  similar  to  the  goal  of  optimistic  signature  ex¬ 
change  (Garay  &  MacKenzie  1999),  but  with  rational  actors 
instead  of  computationally-bounded  ones. 

If  players  may  communicate  without  being  observed  by 
others,  F  would  be  a  game  of  imperfect  information.  As 
these  games  are  dif  cult  to  analyze  and  generally  admit  of 


78 


many  solutions,  we  require  the  players  to  use  a  broadcast 
channel,  on  which  all  messages  sent  are  common  knowl¬ 
edge. 

When  the  center  no  longer  monitors  the  signature  ex¬ 
change  stage,  he  no  longer  knows  in  the  enforcement  stage 
H  whether  the  contracts  have  been  signed  or  not.  There¬ 
fore,  we  now  require  that  each  complaint  sent  to  the  center 
in  the  enforcement  stage  H  include  a  fully  signed  copy  of 
the  contract. 

The  Naive  Broadcast  Protocol 

We  might  hope  that  the  signature  collection  service  per¬ 
formed  by  the  center  was  super  uous:  that  we  will  achieve 
the  same  results  if  we  simply  require  players  to  broad¬ 
cast  their  agreement  or  disagreement.  Unfortunately,  this 
will  not  be  so.  Consider  the  naive  broadcast  protocol 
where  all  players  simultaneously  broadcast  their  signatures. 
Let  us  formally  de  ne  F  to  be  the  one-round  stage  game 
F  =  (N,  TF,  S,  /),  where  N  is  the  set  of  players  and 
Tf  =  {0, 1}™,  where  0  represents  the  decision  not  to  broad¬ 
cast  one’s  signature,  while  1  represents  the  decision  to  do 
so.  S  =  ({0, 1}")”  is  the  set  of  outcomes  of  the  game. 
Each  outcome  speci  es  the  set  of  signatures  (represented  by 
{0, 1}")  possessed  by  each  player  in  N.  f  :  TF  — »  S  is  the 
outcome  function  of  F :  each  player  knows  his  own  signa¬ 
ture  and  every  signature  which  is  broadcast. 

The  complete  set  of  signatures  is  thus  common  knowledge 
if  and  only  if  every  player  chooses  to  broadcast  his  signature. 
Consider  what  occurs  if  exactly  one  player  i  fails  to  reveal 
his  signature:  i  has  received  all  the  signatures  of  the  other 
players,  and  he  can  produce  his  own.  Thus  i  is  the  only 
player  to  possess  all  signatures  on  the  contract,  and  this  fact 
is  common  knowledge  among  the  players.  The  center,  on 
the  other  hand,  cannot  distinguish  this  case  from  the  case 
where  all  players  know  all  signatures,  but  only  i  chooses 
to  complain.  Therefore  i  is  able  to  unilaterally  enforce  the 
contract,  unlike  in  the  original  formulation. 

Recall  that,  in  H,  every  message  sent  by  one  player  to 
the  center  is  broadcast  to  all  other  players.  Thus,  once  one 
player  has  sent  a  complaint  about  another  (which  includes  a 
fully  signed  contract),  every  player  will  know  all  signatures 
on  Ch  and  be  able  to  complain.  A  player  who  deviates  in  F 
cannot  choose  to  punish  other  players  while  remaining  un¬ 
scathed  himself,  but  he  can  choose  unilaterally  whether  or 
not  to  enforce  the  contract.  Unfortunately,  this  power  im¬ 
plies  that  our  previously  speci  ed  equilibrium  for  the  ex¬ 
tended  game  F  ■  G  ■  H  is  no  longer  an  equilibrium. 

The  equilibrium  for  FGH  discussed  above  requires  that, 
if  i  fails  to  reveal  his  signature  on  the  contract,  all  players  co¬ 
ordinate  on  i’s  punishment  equilibrium.  Consider  the  case, 
for  instance,  where  the  punishment  equilibrium  for  some  i 
is  (a,;,  where  a.;  is  the  action  i  is  contractually  obliged 
to  play.  Suppose  i  alone  fails  to  reveal  his  signature  and  all 
players  play  i’s  punishment  equilibrium  (cq,  a'_  J.  In  stage 
//.  then,  i  will  pro  t  by  choosing  to  enforce  the  contract:  the 
center  will  punish  the  other  players  and  reward  i.  Knowing 
this,  the  other  players  will  not  in  general  wish  to  play  their 
part  of  the  punishment  equilibrium,  so  our  previous  strategy 
fails. 


The  Pre-Contract  Protocol 

Although  the  naive  broadcast  protocol  did  not  allow  us  to 
guarantee  all  the  payoffs  we  wanted,  we  shall  see  that  we 
can  use  a  more  complicated  signature  exchange  stage  F  to 
ensure  that  either  each  player  receives  all  signatures  on  Ch, 
or  no  players  receive  all  signatures  on  Ch .  Our  exchange 
scheme  is  modelled  on  the  contracts  mechanism  of  the  rest 
of  the  paper:  we  will  add  a  pre-contract  Ch  that  the  play¬ 
ers  will  sign  before  signing  Ch .  This  contract  authorizes  the 
center  to  ne  players  who  do  not  reveal  their  signature  on  c/,. 
Surprisingly,  this  does  not  lead  to  in  nite  regress:  this  one 
pre-contract  is  suf  cient  to  allow  for  signature  exchange. 

We  will  divide  F  itself  into  stages:  a  miniature  contract 
exchange  stage  F,  a  miniature  execution  stage  G,  and  a 
miniature  enforcement  stage  H.  The  players  will  bind  them¬ 
selves  in  contract  c/,  to  reveal  their  signatures  on  the  contract 
Ch  in  such  a  way  that,  if  they  fail  to  reveal  them,  they  can  be 
ned  by  the  center.  We  will  allow  them,  however,  to  recoup 
that  ne  by  revealing  their  signatures  on  Ch  to  the  center  and 
all  players  after  the  fact.  The  after-the-fact  alteration  of  the 
outcome  allows  us  to  use  the  naive  broadcast  protocol  for  F 
where  we  could  not  use  it  for  F. 

Formally,  let  the  signature  stage  F  =  F  ■  G  ■  H.  Let 
us  call  the  contract  signed  in  F  the  pre-contract  Ch,  which 
binds  players  to  release  their  signatures  on  the  real  contract 
Ch-  F  is  the  naive  broadcast  protocol  de  ned  above  as  the 
stage  game  F  =  ( N ,  VF  ,S,  f).  S  is  the  set  of  signatures  on 
Ch  that  each  player  knows,  and  TF  is  each  player’s  choice  to 
broadcast  or  not  broadcast  his  signature  on4.  G  is  also  the 
naive  broadcast  protocol  de  ned  above  for  the  signatures  on 
Ch'-  G  =  (N,  TG,  $,<)},  with  the  set  of  known  signatures 
on  Ch,  and  TG  :  S  — *  0, 1"  the  decision  of  the  players  to 
broadcast  their  signatures  on  c,h  given  what  signatures  each 
player  knows  on  Ch- 

H  =  (N,  TH ,  R",  h)  is  a  miniature  enforcement  stage 
that  is  substantially  different  from  the  enforcement  stage  //. 
H  has  two  rounds.  In  the  rst  round,  a  player  i  is  allowed 
to  complain  to  the  center  that  he  has  not  received  some  sig¬ 
nature  on  Ch-  To  do  so,  i  must  submit  the  contract  c/j,  all 
signatures  on  Ch,  and  i’s  signature  on  c/,.  When  the  center 
rebroadcasts  this  message  to  all  players,  both  the  signatures 
on  Ch  and  i’s  signature  on  Ch  become  known  to  all  play¬ 
ers.  In  the  second  round,  each  player  who  did  not  complain 
in  the  rst  round  is  given  a  chance  to  complain.  TF  sim¬ 
ply  characterizes  whether  i  will  complain  to  the  center  after 
each  history. 

We  now  state  our  chosen  center  protocol  h  in  II .  If  the 
center  received  a  complaint  in  the  rst  round,  then  the  cen¬ 
ter  nes  all  players  who  did  not  complain  in  either  the  rst 
or  the  second  round  by  a  large-enough  amount  M.  If  the 
center  does  not  receive  any  complaints,  he  does  not  ne  any 
players.  Note  that  if  all  players  complain,  all  signatures  on 
Ch  become  common  knowledge  and  no  nes  are  assessed. 

We  now  specify  the  strategy  7r*  we  expect  the  agents  to 
play  in  F.  In  F,  each  player  rst  reveals  his  signature  on 
Ch-  In  G,  he  will  reveal  his  signature  on  Ch  if  and  only  if 
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he  has  received  the  signatures  of  every  other  player  on  Ch¬ 
in  the  enforcement  stage  H,  he  complains  to  the  center  if 
and  only  if  he  has  received  all  signatures  on  Ch,  but  he  has 
not  received  all  signatures  on  Ch,  or  if  some  other  player  has 
complained. 

The  remainder  of  tt*  for  G  and  H  is  simple.  If  all  sig¬ 
natures  on  Ch.  become  known  to  all  players,  then  the  players 
play  ( a,i,a-i )  in  G  to  achieve  o,  just  as  before,  and  then 
complain  to  the  center  in  H  if  some  player  deviates.  If, 
however,  some  player  i  deviates  from  the  equilibrium  in  F 
(whether  by  choosing  not  to  reveal  in  F,  choosing  not  to 
reveal  in  G,  or  failing  to  complain  in  H),  in  such  a  way 
that  the  signatures  on  Ch  do  not  become  commonly  known, 
then  the  agents  coordinate  on  that  player’s  punishment  equi¬ 
librium  [? .  If  several  players  deviate  during  F,  the  agents 
coordinate  on  the  punishment  equilibrium  of  the  last  player 
to  deviate. 

Our  strategy  tt*  is  now  an  equilibrium.  Thus,  we  can 
achieve  any  outcome  achievable  with  a  busy  center  with  a 
center  that  does  no  work  in  equilibrium. 

Theorem  6  Let  X  be  the  extended  game  F  ■  G  ■  H  ■  G  ■  H, 
and  let  pl  be  the  punishment  equilibrium  for  i  in  G.  Then, 
for  any  0(aua_i)  such  that  for  all  i,  >  Vfp1), 

there  exists  a  contract  Chfor  which  there  is  an  strategy  pro¬ 
le  tt*  such  that  7r*  is  a  subgame  perfect  equilibrium  of  the 
extended  game  and  0(Oj)O_i)  is  the  equilibrium  outcome  of 
it*  .  Furthermore,  in  tt*  ,  the  center  takes  no  action  during 
any  stage. 

Proof:  Let  us  sketch  why  this  will  be  a  subgame  perfect 
equilibrium.  We  will  proceed  by  backwards  induction. 

So  long  as  no  single  player  gains  complete  knowledge  of 
the  signatures  on  Ch,  then  n*  is  a  subgame  perfect  equilib¬ 
rium  in  G  and  if.  This  does  not  occur  if  n*  is  played  in  F, 
so  it  is  suf  cient  to  prove  that  7r*  is  a  subgame  perfect  equi¬ 
librium  in  F.  We  will  show  that,  even  after  one  deviation, 
either  all  players  know  all  the  signatures  on  <:p.  or  no  player 
knows  all  signatures  on  Ch- 

Consider  the  second  round  of  H.  If  no  player  complained 
in  the  rst  round,  second-round  complaints  have  no  effect. 
If  a  player  complained  in  the  rst  round  of  H,  then  it  will 
be  dominant  for  every  other  player  to  complain  according  to 
tt*  to  avoid  the  punishment  of  M  from  the  center.  Thus,  all 
players  will  know  all  signatures  on  cp- 

Consider  the  rst  round  of  H.  There  are  three  cases  to 
distinguish.  First,  if  every  player  knows  all  signatures  on 
both  Ch  and  c/,,  then  complaining  will  have  no  effect.  Sec¬ 
ond,  if  every  player  knows  all  signatures  on  Ch,  but  only  one 
player  knows  all  signatures  on  Ch-  Every  other  player  will 
complain  in  the  rs  t  round  and  i  must  therefore  complain  in 
the  rst  or  second  rounds  to  avoid  losing  M.  Every  player 
will  learn  all  signatures.  Third,  if  only  one  player  i  knows  all 
the  signatures  on  c/j,  then  i  will  not  know  all  the  signatures 
on  Ch-  If  i  does  not  complain,  no  player  will  learn  all  sig¬ 
natures  on  Ch  and  players  will  coordinate  on  i’s  punishment 
equilibrium.  If  i  does  complain,  all  others  will  complain  in 
the  second  round  and  all  players  will  learn  all  signatures. 


Consider  the  stage  G.  There  are  now  two  cases  to  con¬ 
sider.  First,  suppose  all  players  know  all  signatures  on  Ch . 
Then  no  player  i  can  bene  t  by  failing  to  reveal  his  signature 
on  Ch,  since  the  other  players  will  complain,  i  will  complain 
to  avoid  punishment,  and  all  players  will  end  up  learning  all 
signatures  on  ch.  Second,  suppose  only  one  player  i  knows 
the  signatures  on  Ch  because  he  failed  to  reveal  in  F.  Then 
no  other  players  will  reveal  their  signatures,  and,  whether  i 
reveals  or  not,  all  players  will  coordinate  on  i’s  punishment 
equilibrium. 

Finally,  consider  the  stage  F.  Suppose  one  player  i  devi¬ 
ates  in  the  stage  F  by  failing  to  reveal  his  signature  on  the 
pre-contract  Ch-  Then  he  alone  will  have  all  the  signatures 
on  Ch,  and  no  one  else  will  reveal  their  signatures  on  Ch  in  G. 
According  to  the  equilibrium,  i  will  complain  to  the  center 
in  stage  H,  resulting  in  complete  knowledge  of  Ch  and  the 
decision  to  play  {at,  a_j).  □ 

Conclusion 

We  have  discussed  the  power  of  a  helpful  center  in  enabling 
a  group  of  players  to  make  contracts  which  require  them  to 
play  a  certain  strategy  or  face  penalties.  Even  if  the  center 
brings  no  money  to  the  system  and  transfers  money  from  the 
players  only  after  receiving  permission,  the  center  is  able  to 
help  the  players  achieve  nearly  any  outcome  of  the  game. 
Moreover,  we  nd  that  the  center  is  still  able  to  help  the 
players  achieve  these  outcomes  in  equilibrium,  even  if  he 
does  not  monitor  the  game  and  does  not  participate  on  the 
equilibrium  path  -  in  other  words,  even  when  the  center  does 
no  work  in  equilibrium  beyond  suggesting  a  contract. 

In  fact,  if  the  contracts  the  center  would  suggest  are  com¬ 
mon  knowledge  or  determined  by  a  negotiation  stage  be¬ 
tween  the  agents,  the  center  does  no  work  whatsoever  in 
equilibrium.  Incidentally,  we  notice  that  the  center  makes 
a  pro  t  to  cover  his  costs  whenever  his  services  are  used. 
These  two  properties  are  very  important  for  a  third  party  who 
wishes  to  in  uence  outcomes  in  strategic  settings  that  occur 
frequently,  such  as  in  the  online  auction  setting. 
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Abstract 

We  look  at  the  problem  of  computing  a  best- 
response  to  an  opponent’s  strategy,  when  this 
strategy  is  not  known  exactly  but  can  instead  be 
sampled.  We  rst  give  analytic  results  on  the 
number  of  samples  required  in  order  to  approx¬ 
imate  the  optimal  best  response.  We  then  show 
experimentally,  on  a  variety  of  games,  that  one 
can  closely  approximate  the  best  response  with 
a  much  smaller  number  of  samples  than  are  re¬ 
quired  by  the  formal  guarantees.  Finally,  we  go 
beyond  so-called  oblivious  sampling-,  that  is,  we 
consider  what  happens  if  the  opponent  is  aware 
that  the  agent  has  taken  the  samples,  if  the  agent 
knows  that  the  opponent  is  aware,  and  so  on  to 
higher  levels  of  mutual  modelling. 


1  Introduction 

Recent  years  have  seen  much  research  in  AI  on  game  the¬ 
oretic  multi-agent  systems.  Since  most  of  game  theory  ab¬ 
stracts  away  from  computational  issues,  much  of  the  recent 
activity  has  concentrated  on  addressing  algorithmic  consid¬ 
erations.  In  this  work  we  consider  a  particular  algorithmic 
issue,  namely  computing  a  best  response  for  one  agent  to 
a  x  ed  opponent  (when  the  strategies  are  x  ed,  there  is  no 
greater  generality  in  considering  a  set  of  opponents,  since 
they  can  be  viewed  as  a  single  player  with  an  expanded  ac¬ 
tion  space).  We  use  the  terms  ‘agent’  and  ‘opponent’  to  in¬ 
dicate  the  different  status  of  the  two  players  in  the  problem 
we  consider,  but  we  interpret  the  term  ‘opponent’  neutrally 
(in  particular,  it  does  not  mean  that  the  game  is  necessarily 
adversarial). 

Although  there  are  many  other  outstanding  computational 
issues  in  game  theory  (notably,  computing  one  or  all  Nash 
equilibria  (Papadimitriou  2001)),  best-response  computa¬ 
tion  is  arguably  the  most  relevant  to  Al.  There  are  several 
variants  of  the  problem;  for  example,  computing  a  best  re¬ 


sponse  against  a  known  opponent  or  a  distribution  of  op¬ 
ponents.  The  latter,  for  example,  is  the  natural  computa¬ 
tional  question  in  the  context  of  competitions  such  as  the 
Trading  Agent  Competition  (Wellman  &  Wurman  1999). 
Even  against  a  known  opponent,  however,  the  problem  is  in 
general  intractable,  (see  Gilboa  &  Zemel  1989)  The  prob¬ 
lem  of  computing  a  best  response  to  a  bounded  automata 
was  shown  to  be  A' /'-complete  in  (Papadimitriou  1992), 
while  nding  the  best  response  for  an  arbitrary  strategy  can 
be  non-computable  in  some  instances  (Nachbar  &  Zame 
1996). 

Besides  these  complexity  results,  there  are  other  reasons 
why  one  might  not  be  able  to  engage  in  a  straightforward 
best-response  computation.  For  one  thing,  the  opponent’s 
strategy  might  be  too  complicated  to  represent.  Worse,  it 
might  not  be  available  at  all.  Flowever,  recalling  that  the 
opponent’s  strategy  is  a  distribution  over  its  pure  strategies, 
we  can  sometimes  sample  from  this  distribution.  There 
are  many  real-world  examples  where  such  a  situation  could 
arise  naturally.  For  instance,  a  government  might  capture 
some  members  of  a  terrorist  organization  and  learn  their 
strategies.  This  sample  can  then  be  used  to  estimate  the 
strategies  of  the  remaining  agents.  In  a  similar  example 
one  could  study  the  code/strategy  of  a  set  of  known  com¬ 
puter  viruses  and  use  this  information  to  design  effective 
security  against  new  viruses.  Or  in  a  corporate  setting, 
companies  regularly  hire  away  members  of  their  competi¬ 
tors  organization,  giving  them  access  to  those  employees 
knowledge/strategies.  For  organized  games  such  as  chess 
or  poker,  there  are  large  reservoirs  of  data  on  samples  both 
from  records  of  previous  games  played  and  books  written 
describing  proposed  strategies  for  play. 

We  propose  to  analyze  these  situations,  where  the  agent  has 
access  to  some  number  of  samples  from  its  opponent’s  dis¬ 
tribution  and  uses  them  to  calculate  a  best  response.  Sev¬ 
eral  questions  suggest  themselves,  including  the  following: 

•  Theoretically  speaking,  how  many  samples  do  we 
need  to  approximate  the  best  response  within  a  given 
constant? 
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•  Empirically  speaking,  how  good  is  a  best  response  that 
is  based  on  a  small  sample? 

•  What  happens  if  the  opponent  is  aware  that  the  agent 
has  the  samples,  if  the  agent  is  aware  of  this,  and  so 
on  as  one  increases  the  levels  of  outguessing? 

In  the  next  section,  we  show  a  bound  on  the  number  of 
samples  required  to  guarantee  a  payoff  within  e  of  the  true 
best  response.  The  following  section  looks  at  the  empir¬ 
ical  question,  analyzing  payoff  achieved  as  a  function  of 
samples  across  different  game  environments.  Both  of  these 
sections  assume  the  opponent  is  oblivious  to  the  fact  that 
the  agent  is  taking  samples.  To  address  this  limitation,  we 
then  consider  the  results  of  mutual  modelling,  when  the  op¬ 
ponent  is  aware  that  the  samples  have  been  taken.  Finally 
we  conclude  with  a  few  general  observations  and  areas  for 
future  work. 

2  Oblivious  Sampling  -  Theoretical  Bounds 

Given  our  approach  of  drawing  samples  from  an  oppo¬ 
nent’s  distribution  over  strategies,  a  natural  question  is  how 
many  samples  are  required  to  achieve  a  certain  level  of 
performance.  Using  the  Floeffding  inequality  we  can  nd 
the  minimum  number  of  samples  required  to  guarantee 
with  probability  1  —  <5  that  the  best  response  we  compute 
achieves  within  e  of  the  payoff  of  the  best  response  to  the 
true  distribution  (FIoefFding  1956).  Let  us  assume  that  each 
of  our  samples  is  an  independent  and  identically  distributed 
random  variable  drawn  from  the  true  distribution  and  that 
all  payoffs  are  bounded  within  [0,1].  When  computing  the 
best  response  we  are  effectively  calculating  the  payoff  of 
each  of  the  agent’s  k  strategies  from  the  set  A  against  the 
distribution  of  the  m  samples  in  S  and  choosing  the  action 
with  the  highest  payoff.  Let 

a  =  arg  max  V  (a),  (1) 

a&A 

where 

=  (2) 

m  z — ' 

seS 

and  R(a ,  o)  is  the  agent’s  payoff  function  for  the  game 
when  the  agent  plays  a  and  the  opponent  plays  o.  Letting 
a*  be  the  true  best  response,  we  want 

V(a)  >  V(a*)  -  e,  (3) 

where  V (a)  is  the  value  achieved  by  strategy  a  against  the 
true  distribution.  Since  V  is  the  mean  of  m  iid  random  vari¬ 
ables,  we  know  by  the  Hoeffding  inequality  (also  known  as 
the  Chernoff  bound)  that  for  any  a, 

P(\V(a)  —  U(a)|  >  7)  <  2e~2l2m.  (4) 


By  the  union  bound  in  probability  theory,  we  know  that 

P(3a  :  \V(a)-V(a)\  >  7)  <  £  P(\V(a)  -  V{a) \  >  7) 

<  k  *  2e~2^m 

P(Va \V(a)  -  V(a)  |  <  7)  >  1  -  2 ke~2^m.  (7) 

In  particular  if  we  have 

|V(o)-V(o)|<7,  (8) 

|U(a*)-V(a*)|<7  (9) 

and  by  (1) 

V(a)  >  V(a*),  (10) 

we  get 

V(a)>V(a*)-  27.  (11) 

Setting  7=f  and  2ke~2l^m  =  5,  we  can  solve  for  m  to 
get  a  bound  of 

2  2  k 

m>-^  log(y)  (12) 

which  will  guarantee  that  the  best  response  to  the  samples 
is  within  e  of  the  actual  best  response  with  probability  1  —  5. 
Note  that  this  bound  is  independent  of  the  number  of  possi¬ 
ble  opponent  strategies  and  that  it  depends  only  on  the  log 
of  the  size  of  the  agent’s  strategy  space.  This  lets  us  by¬ 
pass  the  potential  complexity  of  the  opponent’s  true  strat¬ 
egy  distribution  by  evaluating  only  the  set  of  agent  strate¬ 
gies  required  to  include  a  best  response  to  any  possible  dis¬ 
tribution.  For  instance,  although  the  set  of  possible  mixed 
strategies  is  in  nite,  the  set  of  pure  strategies  is  guaranteed 
to  contain  a  best  response  to  any  mixed  strategy.  Although 
the  existence  of  these  bounds  is  encouraging,  the  Hoeffding 
inequality  is  well-known  to  be  quite  weak  by  the  learning 
theory  community.  Further  work  has  focused  on  tighten¬ 
ing  the  bounds,  reducing  the  dependence  on  e  from  j?  to 
-  in  many  situations.  The  question  remains,  however,  how 
many  samples  would  be  required  in  practice,  which  will  be 
the  focus  of  the  following  section. 

3  Oblivious  Sampling  -  Empirical  Results 

While  the  theoretical  results  are  encouraging,  the  number 
of  samples  required  could  still  prove  impractical  in  many 
instances.  In  order  to  test  the  empirical  performance  of 
sampling,  we  performed  a  series  of  tests  for  different  types 
of  arti  cial  games  and  distributions  of  opponent  strategies. 
We  found  that  the  sampling  performed  quite  well  in  all 
of  the  domains  and  often  achieved  close  to  optimal  per¬ 
formance  with  only  a  small  number  of  samples.  We  ini¬ 
tially  divided  the  domains  on  the  basis  of  their  payoff  func¬ 
tions,  analyzing  zero-sum,  common-payoff,  and  general- 
sum  games  independently.  However,  the  performance  we 
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observed  seemed  independent  of  these  categories.  On  re- 
ection,  this  is  not  particularly  surprising  for  a  x  ed  oppo¬ 
nent  distribution  since  it  is  oblivious  to  the  samples  being 
taken.  Given  this  situation,  only  the  agent’s  payoffs  are  rel¬ 
evant  in  determining  a  best  response. 

In  order  to  illustrate  these  ndings,  we  will  show  results 
from  two  individual  games:  one-card  poker  and  a  repeated 
game  of  chicken,  as  well  as  aggregate  results  for  randomly 
generated  games.  For  each  setting,  an  opponent  distribu¬ 
tion  is  rst  selected  and  then  m  samples  are  drawn  from 
this  distribution  and  provided  to  the  agent.  The  agent  cal¬ 
culates  a  best  response  to  this  set  of  samples,  weighting 
all  samples  equally,  and  this  best  response  strategy  is  tested 
empirically  against  opponent  strategies  drawn  from  the  true 
distribution. 

In  our  rst  experiment,  we  de  ned  a  game  based  on  a  sim- 
pli  ed  version  of  poker.  At  the  start  of  the  game  each  of 
two  players  receives  a  card  uniformly  selected  from  the  set 
of  integers  [l...iV].  Each  player  then  antes  1  chip,  and  a 
player  is  randomly  chosen  to  start  betting.  On  their  turn, 
each  player  can  choose  either  to  bet  B  chips  or  pass,  play 
then  continues  to  the  other  player  until  either  both  players 
have  passed  or  one  player  folds  (passes  when  the  opponent 
has  bet  a  larger  amount).  If  one  player  folds  the  other  player 
wins  the  contents  of  the  pot,  otherwise  the  player  with  the 
higher  card  wins  (the  pot  is  split  in  the  case  of  a  tie).  The 
results  shown  here  are  for  the  game  where  each  player  is 
limited  to  a  single  bet.  This  gives  each  player  a  total  of  12 
distinct  betting  strategies  which  they  can  condition  on  the 
card  they  receive,  resulting  in  Y2N  possible  pure  strategies. 

We  ran  tests  against  three  different  opponent  distributions. 
The  rst  was  a  uniform  distribution  assigning  equal  prob¬ 
ability  to  playing  any  of  the  12N  pure  strategies.  The  next 
assigned  a  Gaussian  distribution  in  strategy  space  about  a 
random  pure  strategy,  while  the  nal  used  a  probability  dis¬ 
tribution  that  combined  two  such  Gaussians.  The  results 
are  shown  in  Figure  1  for  games  with  bets  of  4  units  and 
a  deck  of  1 0  cards.  Similar  results  were  also  obtained  for 
games  with  a  variety  of  values  for  B  and  N.  Although  not 
shown,  even  in  this  zero-sum  game  the  mini-max  strategy 
fares  worse  than  even  a  small  number  of  samples,  since  it 
fails  to  take  advantage  of  the  opponent’s  liabilities. 

For  our  next  setting,  we  considered  a  repeated  version  of 
the  game  of  Chicken.  Each  player  has  a  choice  of  two  ac¬ 
tions  at  each  time  step,  either  to  ‘Dare’  or  ‘Yield’.  If  both 
players  ‘Dare’,  the  result  is  the  worst  possible  outcome  for 
each  player,  but  a  player  can  achieve  its  maximum  reward 
by  daring  while  its  opponent  yields.  In  Figure  2  we  show 
the  payoffs  for  the  three  versions  of  the  game  of  chicken  we 
used  in  our  experiments.  In  the  repeated  setting,  the  strat¬ 
egy  space  allows  the  players  to  base  their  current  actions  on 
the  history  of  the  last  H  outcomes  within  the  game.  Note 
that  we  restrict  the  agent  as  well  to  only  using  informa¬ 


Figure  1:  Percent  of  maximum  payoff  achieved  using  the 
given  number  of  samples  in  single-card  poker. 

tion  from  the  last  H  outcomes  during  the  repeated  game, 
thus  resulting  in  essentially  a  one-shot  game.  Since  there 
are  4  possible  outcomes,  this  results  in  a  strategy  space  of 
size  24  .  Once  again  we  obtained  quick  convergence  to 
the  optimal  response  value  with  a  relatively  small  number 
of  samples,  despite  the  huge  strategy  space.  The  results  are 
qualitatively  quite  similar  to  those  for  single-card  poker,  so 
we  have  omitted  the  graph,  although  this  setting  will  be  re¬ 
ferred  to  again  in  the  following  section. 

Finally,  experiments  were  performed  for  randomly  gener¬ 
ated  matrix  games.  Each  game  has  k  actions  for  each  agent 
with  the  payoffs  randomly  distributed  in  the  range  [0,1]. 
For  these  games  we  added  an  additional  opponent  distribu¬ 
tion  which  was  forced  to  assign  zero  probability  to  strate¬ 
gies  employing  dominated  actions.  Results  for  general-sum 
games  of  different  sizes  against  this  new  opponent  distri¬ 
bution  are  shown  in  Figure  3.  Results  are  basically  identi¬ 
cal  against  a  uniform  distribution  over  opponents  or  when 
the  payoffs  are  restricted  to  be  common-payoff.  While  the 
agent  can  still  achieve  close  to  optimal  payoffs,  the  number 
of  samples  required  increases  more  rapidly  with  the  num¬ 
ber  of  actions  than  in  the  previous  games,  possibly  indi¬ 
cating  an  advantage  of  the  structure  in  the  previous  games 
when  using  sampling. 

From  the  empirical  results  we  can  see  that  sampling  is  ef¬ 
fective  in  approximating  a  best  response  across  a  wide  vari¬ 
ety  of  different  environments.  Most  of  these  environments 
allow  us  to  achieve  good  performance  with  far  fewer  sam¬ 
ples  then  our  theoretical  bounds  require.  We  can  de  ne  a 
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Figure  2:  Single  round  payoffs  in  three  games  of  Chicken. 
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Figure  3 :  Percent  of  maximum  payoff  achieved  for  random 
matrix  games  of  different  sizes. 

metric  for  the  ease  with  which  this  is  possible  by  setting  a 
performance  guarantee  we  wish  (such  as  99%  of  optimal) 
and  then  comparing  the  sample  complexity  for  achieving 
this  performance  across  games.  Figure  4  shows  the  values 
for  such  a  metric  applied  to  the  games  analyzed  here. 

4  Beyond  Oblivious  Sampling 

So  far,  we’ve  assumed  that  the  agent  is  the  only  one  al¬ 
lowed  to  explicitly  reason  about  their  opponent.  They  are 
allowed  to  gather  samples  from  the  opponent’s  distribu¬ 
tion  and  then  compete  against  the  original  distribution  un¬ 
affected  by  their  actions.  What  if  the  opponent  is  aware 
that  the  samples  have  been  taken?  Presumably  they  might 
use  this  information  to  change  their  strategy  in  response  to 
the  new  situation.  We  can  then  ask  a  number  of  questions 
about  this  setting. 

•  Flow  should  the  opponent  use  its  awareness  that  sam¬ 
ples  were  taken  and  how  should  the  agent  respond  in 
turn? 

•  When  and  to  whom  is  it  an  advantage  to  explicitly 
model  the  other  player? 
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Poker  (N=10,  B=4,  Uniform  Dist.) 
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Poker  (N=10,  B=4,  Gaussian  Dist.) 
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Chicken  (Payoffs  4_2_1,  Uniform) 
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1 
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Random  Games  (16  Actions) 

3 

10 

64 

Random  Games  (64  Actions) 

3 

25 

250 

Random  Games  (256  Actions) 

1 

50 

500 

Figure  4:  Samples  required  to  approximate  best  response. 


•  What  happens  in  the  limit  of  in  nite  mutual  mod¬ 
elling? 

First  we  need  to  consider  what  kind  of  information  the  op¬ 
ponent  might  have.  They  could  know  only  that  some  sam¬ 
ples  were  taken,  and  nothing  about  the  value  or  number  of 
samples.  In  this  case  it  seems  the  only  possible  improve¬ 
ment  the  opponent  could  make  in  their  strategy  would  be 
to  assume  the  agent  has  calculated  a  best  response  to  their 
true  distribution  and  play  a  best  response  to  that.  Flowever, 
if  the  opponent  in  addition  knows  how  many  samples  were 
taken  (or  has  a  distribution  over  likely  sample  sizes)  they 
can  improve  their  performance.  This  may  seem  surprising, 
but  consider  the  game  shown  in  Figure  5,  with  the  oppo¬ 
nent’s  original  distribution  consisting  of  75%  agents  which 
always  play  L  and  25%  agents  which  always  play  R. 

If  the  agent  calculates  the  true  best  response,  it  will  al¬ 
ways  play  B,  forcing  the  opponent  (the  column  player) 
to  always  play  L.  Flowever,  if  the  agent  only  receives 
one  sample,  then  75%  of  the  time  it  will  choose  T  as  its 
best  response  instead.  Now,  playing  L  only  yields  a  pay¬ 
off  of  1,  while  playing  R  yields  an  expected  payoff  of  3 
(75%  x  (V(T,  R)=  4)  4-  25%  x  (V(B,  R)  =  0)).  In  this 
example,  if  the  agent  has  taken  many  samples  their  play 
will  likely  correspond  to  the  best  response  strategy  for  the 
true  distribution,  however,  there  is  no  guarantee  this  is  true 
in  general.  While  we  showed  earlier  that  the  value  achieved 
by  the  agent  will  approach  the  value  of  the  best  response, 
this  does  not  in  general  imply  that  the  play  will  approach 
the  best  response  strategy  for  any  nite  number  of  samples. 
The  agent  may  have  a  strategy  that  yields  within  e  of  the 
maximum  value  but  lies  arbitrarily  far  away  in  the  strat¬ 
egy  space.  In  this  case,  there  is  no  guarantee  that  playing 
the  best  response  to  the  true  best  response  is  a  good  strat¬ 
egy  against  the  e-best  response  actually  played.  In  practice 
this  tends  to  work  well  for  large  numbers  of  samples,  but 
is  dominated  by  employing  the  correct  Bayesian  calcula¬ 
tion  using  the  actual  number  of  samples  taken.  The  exact 
calculation  can  prove  infeasible  for  large  strategy  spaces, 
but  once  again  we  can  employ  sampling  to  approximate  it, 
this  time  from  the  opponent’s  point  of  view.  The  opponent 
can  randomly  draw  sets  of  samples  from  its  original  true 
distribution  to  match  the  samples  the  agent  may  have  taken 
and  use  the  best  responses  to  these  samples  sets  as  its  sam¬ 
ples  from  the  agent’s  true  distribution.  In  Figure  6  we  can 
see  the  results  of  using  this  method  for  a  variety  of  sample 
sizes.  The  measure  here  is  how  much  value  is  lost  by  the 
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Figure  5:  Game  showing  the  value  of  knowing  the  sample 
size. 


84 


Figure  6:  Opponent’s  loss  of  value  in  simple  poker  setting 
versus  that  achieved  with  full  information  on  samples  taken 
for  different  agent  sample  sizes. 

opponent  relative  to  the  value  they  could  have  achieved  if 
they  knew  the  exact  samples  taken  by  the  agent.  We  can 
see  that  this  improves  as  the  opponent  simulates  more  sam¬ 
ple  sets  and  that  the  loss  from  not  knowing  the  samples  is 
highest  when  a  single  sample  was  taken. 

Next,  we  consider  the  case  in  which  the  opponent  knows 
not  only  the  number  of  samples  but  also  the  value  of  each 
sample.  They  could  assume  the  agent  would  calculate  a 
best  response  to  those  samples  and  therefore  calculate  their 
own  best  response  to  the  agent’s  strategy.  Of  course,  this 
process  can  escalate  inde  nitely  ,  as  the  agent  realizes  the 
opponent  knows  the  samples  were  taken,  etc. 

For  the  settings  we  described  previously,  we  analyzed  the 
results  of  this  recursive  modelling  under  different  assump¬ 
tions  about  the  level  of  mutual  modelling  carried  out  by 
each  agent.  Results  for  the  individual  games  are  dis¬ 
cussed  in  the  following  subsections,  organized  into  zero- 
sum,  common-payoff,  and  general-sum  games. 

Note  that  for  these  results  we  assumed  that  at  least  one 
player  had  correct  beliefs  about  the  other  player  and  also 
that  the  other  player’s  beliefs  were  accurate  except  for  not 
accounting  for  the  last  level  of  analysis  performed.  As 
the  players  beliefs  depart  further  from  the  actual  situation, 
the  payoffs  tend  to  move  towards  those  achieved  by  a  ran¬ 
dom  strategy.  This  generally  corresponds  to  a  decrease  in 
value  for  each  agent  for  team  games  and  many  general  sum 
games. 

4.1  Zero-sum  Games 

Within  the  simple  poker  game,  as  each  player  performs  ad¬ 
ditional  introspection  about  the  other’s  strategy,  the  mag¬ 
nitude  of  the  payoffs  increases  until  the  play  settles  into 
a  simple  cycle  of  alternating  strategies.  In  this  cycle,  the 
primary  driver  of  value  is  holding  the  correct  beliefs  about 
the  level  of  analysis  the  other  player  has  performed.  When 


Figure  7:  Payoffs  for  increased  levels  of  introspection 


holding  correct  beliefs  each  player  is  guaranteed  at  least 
the  value  of  the  game,  since  they  are  performing  a  best 
response  calculation.  Payoffs  for  the  sampling  agent  in 
the  simpli  ed  poker  game  are  shown  in  Figure  7.  The 
points  labelled  OA  correspond  to  the  agent  calculating  a 
best  response  to  the  samples  received  without  the  opponent 
changing  strategies.  Each  point  to  the  right  then  advances 
the  level  of  mutual  modelling  by  one  (with  the  letter  indi¬ 
cating  which  player  has  the  informational  advantage).  The 
number  of  samples  taken  has  little  impact  on  payoff  and 
mainly  serves  to  decrease  the  magnitude  of  the  oscillations 
as  the  number  of  samples  increases.  Presumably  this  is  be¬ 
cause  the  strategies  found  with  larger  numbers  of  samples 
are  slightly  more  robust. 

4.2  Common-payoff  Games 

In  our  empirical  experiments,  the  play  would  often  con¬ 
verge  rapidly  to  the  Paretto  optimal  Nash  equilibrium  of 
the  game  as  the  players  reasoned  more  deeply  about  one  an¬ 
other.  In  general,  the  players  are  guaranteed  to  converge  to 
playing  a  Nash  equilibrium  for  some  nite  amount  of  mu¬ 
tual  modelling  (assuming  a  nite  action  space  and  generic 
payoffs).  This  follows  since  each  player  has  full  knowledge 
of  the  best  response  calculated  by  the  other  for  any  level  of 
reasoning.  Given  this,  each  subsequent  best  response  cal¬ 
culation  is  guaranteed  to  be  monotonically  nondecreasing 
in  the  common  value  achieved  by  all  players.  For  two- 
player  games  or  games  with  generic  payoffs,  the  players 
will  not  enter  a  cycle  where  they  return  to  a  previously  en¬ 
countered  outcome  until  they  reach  a  x  ed-point  outcome. 
Since  there  are  a  nite  number  of  distinct  outcomes,  they 
must  eventually  arrive  at  an  outcome  where  each  player  is 
playing  its  best  response,  meeting  the  de  nition  of  a  Nash 
equilibrium. 

4.3  General-sum  Games 

We  ran  additional  tests  on  the  game  of  centipede.  In  this 
game,  each  player  alternates  in  choosing  whether  to  stop 
the  game  or  continue  for  up  to  one  hundred  turns.  Each 
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Figure  8:  Value  achieved  by  each  player  for  successive  lev¬ 
els  of  mutual  modelling  in  the  centipede  game. 

player  then  receives  a  reward  equal  to  the  number  of  turns 
the  game  continued  plus  a  bonus  reward  of  two  for  the 
player  who  chose  to  stop  the  game.  Looking  at  Figure  8, 
we  can  see  that  there  is  at  rst  a  jump  in  value  to  both  play¬ 
ers  since  the  agent’s  initial  best  response  is  near  the  strategy 
of  always  continuing.  The  value  then  drops  steadily  for  in¬ 
creased  introspection,  matching  the  process  of  backwards 
induction.  (If  no  player  would  ever  continue  past  k  steps, 
then  the  best  response  is  to  always  stop  the  game  at  k- 1 
steps.)  The  limit  of  this  process  is  the  unique  Nash  equilib¬ 
rium,  where  each  player  chooses  to  stop  immediately. 

In  the  repeated  game  of  chicken,  play  once  again  converges 
to  a  Nash  equilibrium  in  the  limit.  Flowever,  this  game  has 
multiple  Nash  equilibria  corresponding  to  different  players 
choosing  ‘Dare’  when  their  opponent  chooses  ‘Yield’.  The 
beliefs  in  the  game  will  tend  to  converge  to  the  Nash  equi¬ 
librium  closest  in  the  strategy  space  to  the  agent’s  initial 
best  response,  which  is  a  function  of  the  particular  pay¬ 
offs.  In  Figure  9,  we  can  see  the  payoffs  to  each  player 
for  increasing  amounts  of  mutual  modelling  by  both  play¬ 
ers.  Again,  each  point  to  the  right  along  the  x-axis  indicates 
one  more  level  of  analysis  by  one  of  the  players. 

Looking  at  these  two  games  we  can  address  the  question 
of  whether  informational  advantage  always  offers  a  better 
payoff.  It  turns  out  that  the  answer  depends  on  who  cur- 


Figure  9:  Value  achieved  by  each  player  for  successive  lev¬ 
els  of  mutual  modelling  in  a  repeated  game  of  chicken. 
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Figure  10:  Sample  game  in  which  recursive  modelling  does 
not  converge  to  a  Nash  equilibrium. 

rently  has  the  advantage.  Unsurprisingly,  if  the  opponent 
currently  has  the  advantage,  it  can  never  hurt  the  agent’s 
payoff  (in  the  short  turn)  to  do  further  analysis  to  take  the 
advantage,  since  this  is  a  best-response  calculation.  How¬ 
ever,  if  the  agent  currently  has  the  advantage,  the  opponent 
taking  the  advantage  could  help.  This  is  particularly  true 
in  team  games  where  all  players  are  maximizing  the  same 
function,  but  can  also  occur  in  some  general-sum  games, 
such  as  the  repeated  game  of  chicken  shown  above.  No¬ 
tice  in  Figure  9,  that  when  the  opponent  rst  starts  to  rea¬ 
son  about  the  agent  taking  samples  that  the  payoffs  to  both 
players  go  up. 

Given  that  these  games  have  all  converged  to  Nash  equilib¬ 
ria,  will  this  process  in  general  converge  to  a  pure  strategy 
Nash  equilibrium  when  one  exists?  If  the  process  does  con¬ 
verge,  it  must  be  to  a  Nash  equilibrium  since  each  player 
calculates  best  response.  However,  it  turns  out  not  to  con¬ 
verge  in  all  games  since  it  is  possible  for  the  best  responses 
to  cycle  through  outcomes  without  encountering  an  equi¬ 
librium.  Consider  the  game  shown  in  Figure  10  where 
the  agent  received  a  sample  opponent  strategy  that  always 
plays  T.  One  can  see  that  mutual  modelling  will  result 
in  each  player  alternating  between  always  playing  H  or 
always  playing  T  without  ever  reaching  either  the  mixed 
Nash  of  playing  H  and  T  each  with  50%  probability  or  the 
pure  Nash  equilibrium  with  both  players  playing  X. 

5  Conclusions  and  Future  Work 

Most  of  our  results  fall  in  line  with  our  predictions  stated 
in  the  introduction,  although  not  without  a  few  surprises. 

•  Sampling  can  be  shown  to  approximate  the  true  best 
response  with  a  number  of  samples  logarithmic  in  the 
size  of  the  agent’s  strategy  space. 

•  Empirical  results  show  sampling  to  be  effective 
against  a  variety  of  hidden  opponent  distributions  and 
different  game  environments. 

•  Analyzing  successive  mutual  modelling  by  the  players 
results  in  either  convergence  to  equilibrium  or  cyclic 
behavior,  reminiscent  of  ctitious  play  in  repeated 
games.  (Brown  1951) 
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•  Even  a  small  difference  in  player  knowledge,  such  as 
information  about  the  number  of  samples  taken,  can 
produce  a  major  difference  in  the  value  an  agent  can 
achieve. 

•  While  in  general  the  advantage  from  modelling  falls 
to  the  player  with  correct  beliefs,  games  with  some 
degree  of  cooperation  allowed  one  player  to  bene  t 
from  increased  mutual  modelling  by  the  other  player. 

Several  of  our  ndings  encourage  further  study  of  this  ap¬ 
proach.  For  many  of  the  settings  tested,  the  empirical  per¬ 
formance  far  exceeded  the  theoretical  guarantees  for  small 
numbers  of  samples.  Although  the  bounds  were  known 
to  be  weak,  can  we  more  clearly  characterize  the  factors 
leading  to  settings  where  sampling  is  most  effective?  One 
possible  factor  relates  to  qualities  of  the  agent’s  strategy 
space.  In  the  proof  we  assumed  a  worst  case  distribution  of 
payoffs  for  the  agent’s  strategies.  In  practice  the  effective 
space  of  strategies  that  need  be  considered  may  be  much 
smaller  for  many  games.  For  instance,  there  may  be  many 
strategies  that  achieve  (nearly)  optimal  payoffs  against  the 
true  distribution,  creating  a  wider  target  to  choose  from. 
Also,  there  may  be  many  strategies  that  fail  to  be  best  re¬ 
sponses  against  (almost)  any  strategy  selected  by  the  sam¬ 
ples.  These  strategies  can  then  be  ignored  in  the  calculation 
since  there  is  no  chance  of  their  being  confused  with  an  op¬ 
timal  strategy.  A  rst  step  in  better  de  ning  the  space  of 
games  where  sampling  is  effective  could  be  achieved  by 
testing  the  approach  over  a  wide  sample  of  games  and  op¬ 
ponent  distributions.  A  machine  learning  approach  could 
then  be  used  to  leam  what  parameters  affect  the  number 
of  samples  required.  Given  this  understanding,  it  may  be 
possible  to  tighten  the  theoretical  bounds  on  the  number  of 
samples  required  by  adding  additional  restrictions  on  the 
strategy  space  and  the  payoffs  of  the  game. 

While  both  agents  could  pro  t  in  general-sum  games  with 
some  coordination,  the  exact  values  achieved  could  vary 
signi  cantly  based  on  the  starting  samples,  as  seen  in  the 
Repeated  Chicken  game.  This  opens  up  a  further  question 
of  how  an  opponent  might  take  advantage  of  this  depen¬ 
dence.  What  if  the  opponent  had  the  ability  to  manipu¬ 
late  the  samples  the  agent  received?  This  could  occur  ei¬ 
ther  through  adding  additional  samples  of  their  choosing 
(planted  evidence  or  double  agents)  or  changing  the  distri¬ 
bution  the  agent  is  sampling  from. 

Additionally,  all  of  our  detailed  analysis  on  recursive  mod¬ 
elling  assumed  that  the  agents  had  certainty  over  the  infor¬ 
mation  they  possessed  about  the  other  player  (even  when 
one  agent  was  wrong).  What  if  we  instead  allow  the  agents 
to  have  more  complex  beliefs  over  the  possible  information 
or  beliefs  of  the  other  agent?  This  may  allow  the  agents  to 
develop  more  robust  strategies  as  a  result  of  their  uncer¬ 
tainty.  Note  that  as  long  as  the  agents  can  characterize  their 
uncertainty  precisely  they  can  form  a  probability  distribu¬ 


tion  over  possible  opponent  strategies  and  still  calculate  a 
pure  strategy  best  response.  If  however  they  believe  their 
opponent  may  outguess  them  in  unforseen  ways,  there  can 
be  pressure  to  play  mixed  strategies  that  reduce  the  penalty 
of  losing  the  informational  advantage.  Given  that  equilib¬ 
ria  are  often  justi  ed  as  the  endpoints  of  in  nite  analysis 
among  perfectly  rational  opponents,  are  there  any  reason¬ 
able  constraints  we  could  add  to  the  player’s  process  of  mu¬ 
tual  modelling  such  that  in  the  limit  it  would  also  converge 
to  an  equilibrium? 

In  terms  of  direction  for  our  future  work,  probably  the 
greatest  limitation  of  the  current  approach  is  that  it  requires 
that  the  samples  from  the  opponent’s  distribution  be  entire 
strategies.  For  instance,  in  the  simple  poker  setting  each 
sample  contains  the  betting  strategy  for  the  opponent  given 
any  possible  card.  In  many  situations  it  is  much  easier  to 
acquire  part  of  a  strategy,  such  as  the  on-path  play  for  a 
previous  game.  The  question  of  how  best  to  utilize  this 
weaker  information  would  help  expand  the  applicability  of 
this  work  to  a  wider  class  of  settings. 
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Abstract 

Dispersion  games  are  the  generalization  of  the  anti¬ 
coordination  game  to  arbitrary  numbers  of  agents  and  ac¬ 
tions.  In  these  games  agents  prefer  outcomes  in  which  the 
agents  are  maximally  dispersed  over  the  set  of  possible  ac¬ 
tions.  This  class  of  games  models  a  large  number  of  natu¬ 
ral  problems,  including  load  balancing  in  computer  science, 
niche  selection  in  economics,  and  division  of  roles  within  a 
team  in  robotics.  Our  work  consists  of  two  main  contribu¬ 
tions.  First,  we  formally  define  and  characterize  some  inter¬ 
esting  classes  of  dispersion  games.  Second,  we  present  sev¬ 
eral  learning  strategies  that  agents  can  use  in  these  games, 
including  traditional  learning  rules  from  game  theory  and  ar¬ 
tificial  intelligence,  as  well  as  some  special  purpose  strate¬ 
gies.  We  then  evaluate  analytically  and  empirically  the  per¬ 
formance  of  each  of  these  strategies. 

Introduction 

A  natural  and  much  studied  class  of  games  is  the  set  of  so- 
called  coordination  games,  one-shot  games  in  which  both 
agents  win  positive  payoffs  only  when  they  choose  the  same 
action  (Schelling  I960).1  A  complementary  class  that  has 
received  relatively  little  attention  is  the  set  of  games  in  which 
agents  win  positive  payoffs  only  when  they  choose  distinct 
actions;  these  games  have  sometimes  been  called  the  anti¬ 
coordination  games.  Most  discussion  of  these  games  has 
focused  only  on  the  two-agent  case  (see  Figure  1),  where 
the  coordination  game  and  the  anti-coordination  game  dif¬ 
fer  by  only  the  renaming  of  one  player’s  actions.  Flowever, 
with  arbitrary  numbers  of  agents  and  actions,  the  two  games 
diverge;  while  the  generalization  of  the  coordination  game 
is  quite  straightforward,  that  of  the  anti-coordination  game 
is  more  complex.  In  this  paper  we  study  the  latter,  which 
we  call  dispersion  games  (DGs),  since  these  are  games  in 
which  agents  prefer  to  be  more  dispersed  over  actions.2  Al¬ 
though  one  can  transform  a  dispersion  game  into  a  coordi- 

*This  work  is  supported  in  part  by  DARPA  Grant  F30602-00- 
2-0598  and  by  a  Benchmark  Stanford  Graduate  Fellowship. 
Copyright  ©  2002,  American  Association  for  Artificial  Intelli¬ 
gence  (www.aaai.org).  All  rights  reserved. 

'in  this  paper,  we  assume  familiarity  with  basic  game  theory; 
our  formulations  are  in  the  style  of  (Osborne  &  Rubinstein  1994). 

2We  chose  this  name  after  (Alpern  2001 )  who  studies  a  subclass 
of  these  games  which  he  calls  spatial  dispersion  problems. 
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Figure  1:  Two-agent  coordination  game  (left)  and  anti¬ 
coordination  game  (right). 

nation  game  in  which  agents  coordinate  on  a  maximally  dis¬ 
persed  assignment  of  actions  to  agents,  the  number  of  such 
assignments  grows  exponentially  with  the  number  of  agents. 

DGs  model  natural  problems  in  a  number  of  different  do¬ 
mains.  Perhaps  the  most  natural  application  is  presented  by 
the  much  studied  load  balancing  problem  (see,  e.g.,  Azar 
et  al.  2000).  This  problem  can  be  modeled  as  a  DG  in 
which  the  agents  are  the  users,  the  possible  actions  are  the 
resources,  and  the  equilibria  of  the  game  are  the  outcomes 
in  which  agents  are  maximally  dispersed.  Another  natural 
application  of  DGs  is  presented  by  the  niche  selection  prob¬ 
lem  studied  in  economics  and  evolutionary  biology.  In  a 
general  niche  selection  problem,  each  of  n  oligopoly  pro¬ 
ducers  wishes  to  occupy  one  of  k  different  market  niches, 
and  producers  wish  to  occupy  niches  with  fewer  competi¬ 
tors.  Other  niche  selection  problems  include  the  Santa  Fe 
bar  problem  proposed  by  Arthur  (1994),  and  the  class  of  mi¬ 
nority  games  (Challet  &  Zhang  1997).  These  niche  selection 
problems  can  all  be  modeled  in  a  straightforward  manner  by 
DGs.  Finally,  we  note  that  DGs  can  also  serve  as  a  model  of 
the  process  of  role  formation  within  teams  of  robots.  In  fact, 
the  initial  motivation  for  this  research  came  from  empirical 
work  on  reinforcement  learning  in  RoboCup  (Balch  1998). 

This  paper  makes  two  types  of  contributions.  First,  we 
formally  define  and  characterize  some  classes  of  DGs  that 
possess  special  and  interesting  properties.  Second,  we  ana¬ 
lyze  and  experimentally  evaluate  the  performance  of  differ¬ 
ent  learning  strategies  in  these  classes  of  games,  including 
two  standard  learning  rules  from  game  theory  and  artificial 
intelligence,  as  well  as  two  novel  strategies.  The  remainder 
of  this  article  is  organized  as  follows.  In  the  first  section  we 
present  the  game  definitions.  In  the  second  and  third  sec¬ 
tions  we  present  the  learning  strategies  and  the  results  and 
analysis  of  their  performance.  Finally,  in  the  last  section  we 
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discuss  these  findings  and  present  ideas  for  future  research. 

Dispersion  Game  Definitions 

In  this  section  we  begin  by  discussing  some  simple  disper¬ 
sion  games,  and  work  our  way  gradually  to  the  most  general 
definitions.  All  of  the  DGs  we  define  in  this  section  are  sub¬ 
classes  of  the  set  of  normal  form  games,  which  we  define  as 
follows. 

Definition  1  (CA,  CP,  CACP  games)  A  normal  form  game 

G  is  a  tuple  (N,  (Aj)jejv,  {hi)ieN),  where 

•  N  is  a  finite  set  of  n  agents , 

•  Ai  is  a  finite  set  of  actions  available  to  agent  i  £  N,  and 

•  ft  is  the  preference  relation  of  agent  i  £  N,  defined 
on  the  set  of  outcomes  O  =  An,  that  satisfies  the  von 
Neumann-Morgenstem  axioms. 

A  game  G  is  a  common  action  (CA)  game  if  there  exists  a 
set  of  actions  A  such  that  for  all  i  £  N,  Aj  =  A;  we  rep¬ 
resent  a  CA  game  as  ( N ,  A,  (hi)ie at).  Similarly,  a  game  is 
a  common  preference  (CP)  game  if  there  exists  a  relation  A 
such  that  for  all  i  £  N,  hi— A;  we  represent  a  CP  game  as 
( N ,  (A,;)j6jv,  h)-  We  denote  a  game  that  is  both  CA  and  CP 
as  CACP.  We  represent  a  CACP  game  as  ( N ,  A,  A) 

Note  that  we  use  the  notation  (a\, . . . ,  an)  to  denote  the 
outcome  in  which  agent  1  chooses  action  oi,  agent  2  chooses 
action  a 2,  and  so  on.  In  a  CA  game  where  \A\  =  k,  there 
are  kn  total  outcomes. 

Common  Preference  Dispersion  Games 

Perhaps  the  simplest  DG  is  that  in  which  n  agents  indepen¬ 
dently  and  simultaneously  choose  from  among  n  actions, 
and  the  agents  prefer  only  the  outcomes  in  which  they  all 
choose  distinct  actions.  (This  game  was  defined  indepen¬ 
dently  in  (Alpern  2001).)  We  call  these  outcomes  the  maxi¬ 
mal  dispersion  outcomes  (MDOs). 

This  simple  DG  is  highly  constrained.  It  assumes  that 
the  number  of  agents  n  is  equal  to  the  number  of  actions 
k  available  to  each  agent.  However,  there  are  many  prob¬ 
lems  in  which  k  /  n  that  we  may  wish  to  model  with  DGs. 
When  k  >  n  the  game  is  similar  to  the  k  =  n  game  but 
easier:  there  is  a  larger  proportion  of  MDOs.  When  k  <  n 
however,  the  situation  is  more  complex:  there  are  no  out¬ 
comes  in  which  all  agents  choose  distinct  actions.  For  this 
reason,  we  will  need  a  more  general  definition  of  an  MDO. 
In  the  definitions  that  follow,  we  use  the  notation  n°  to  be 
the  number  of  agents  selecting  action  a  in  outcome  o. 

Definition  2  (MDO)  Given  a  CA  game  G,  an  outcome 
o  =  (01, . . . ,  a,, . . . ,  an)  of  G  is  a  maximal  dispersion 
outcome  iff  for  all  agents  i  £  N  and  for  all  outcomes 
o'  =  (ai, ...  ,a[, ... ,  an)  such  that  a'  f  a,,  it  is  the  case 
that  n°.  <  n°, . 

In  other  words,  an  MDO  is  an  outcome  in  which  no  agent 
can  move  to  an  action  with  fewer  other  agents.  Note  that 
when  the  number  of  agents  is  less  than  or  equal  to  the  num¬ 
ber  of  actions,  an  MDO  allocates  exactly  one  agent  to  each 
action,  as  above. 


Under  this  definition,  the  number  of  MDOs  in  a  general 
CA  game  with  k  actions  is  given  by 

(  k  ) 

MDO(n,  k )  =  n!  niynm°dhk/  ...  • 
v  '  \n/k]nmodk[n/k\\k 

When  k  =  n  this  expression  simplifies  to  n\,  since  there  are 
n!  ways  to  allocate  n  agents  to  n  actions. 

The  simple  DG  presented  above  also  makes  another 
strong  assumption.  It  assumes  that  an  agent’s  preference 
over  outcomes  depends  only  on  the  overall  configuration  of 
agents  and  actions  in  the  outcome  (such  as  the  number  of 
agents  that  choose  distinct  actions),  but  not  on  the  particular 
identities  of  the  agents  or  actions  (such  as  the  identities  of 
the  actions  that  are  chosen).  We  call  these  the  assumptions 
of  agent  symmetry  and  action  symmetry.  However,  many 
situations  we  might  like  to  model  are  not  agent  and  action 
symmetric.  For  example,  role  formation  on  soccer  teams  is 
not  action  symmetric.  The  identity  of  a  particular  field  posi¬ 
tion  in  an  outcome  can  affect  the  performance  of  the  team:  a 
team  with  a  goalie  but  no  halfback  would  probably  perform 
better  than  one  with  a  halfback  but  no  goalie,  all  else  being 
equal.  Robot  soccer  is  also  not  necessarily  agent  symmetric. 
If  agent  1  is  a  better  offensive  than  defensive  player,  then  a 
team  may  perform  better  if  agent  1  is  a  forward  instead  of  a 
fullback,  all  else  being  equal.  We  use  the  following  formal 
definitions  of  symmetry. 

Definition  3  (Agent  Symmetry)  A  CA  game  G  = 
(N,A,(hi)  igiv)  is  agent  symmetric  iff  for  all  outcomes 
o  =  (di, . . . ,  a.j, . . . ,  an),  and  for  all  permutations 
o'  =  (a( , . . . ,  a', . . . ,  a'n)  of  o,  for  all  i  £  N,  o  hi  o'  and 
o'  hi  o. 

Definition  4  (Action  Symmetry)  A  CA  game  G  = 
(N,  A,  (hi)  igiv)  is  action  symmetric  iff  for  all  outcomes 
o  =  (alt . . . ,  ai, . . . ,  an)  and  o'  =  (ai, . . . ,  a', . . . ,  a'n),  if 
there  exists  a  one-to-one  mapping  f  :  A  — >  A  such  that 
for  all  i  £  N,  f(ai)  =  a(,  then  for  all  i  £  N,  o  hi  o'  and 
o'  hi  o. 

In  fully  symmetric  games,  agents  cannot  distinguish  be¬ 
tween  outcomes  with  the  same  configuration  of  numbers  of 
agents  choosing  actions.  Thus  we  use  the  abbreviated  nota¬ 
tion  {m, . . . ,  nfc}  to  refer  to  the  set  of  outcomes  in  which 
ni  agents  choose  some  action,  n2  agents  choose  a  different 
action,  and  so  on.  By  convention,  we  order  the  actions  from 
most  to  least  populated. 

We  are  now  ready  to  state  the  formal  definition  of  a  weak 
DG  that  is  well  defined  over  the  set  of  all  CACP  games, 
including  asymmetric  games  and  games  with  arbitrary  n,  k. 

Definition  5  (Weak  DG)  A  CACP  game  G  =  (N,  A,  h)  is 
a  weak  dispersion  game  iff  the  set  of  h -maximal  outcomes 
of  G  is  a  subset  of  the  set  of  MDOs  of  G. 

This  definition  requires  only  that  at  least  one  of  the  MDOs 
is  a  preferred  outcome,  and  that  none  of  the  non-MDOs 
is  a  preferred  outcome.  This  definition  is  weak  because  it 
places  no  constraints  on  the  preference  ordering  for  the  non- 
maximally-preferred  outcomes.3  For  this  reason,  we  also 

3The  reader  may  wonder  why  our  definitions  don't  require  that 
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state  a  strong  definition.  Before  we  can  state  the  definition, 
however,  we  will  need  the  following  dispersion  relation. 

Definition  6  Q)  Given  two  outcomes  o  = 

(ai, ...  ,at, ,  an)  and  o'  =  (a[, . . . ,  a', . . . ,  a'n), 

we  have  that  o  D  o'  iff  there  exists  a  agent  i  £  N  such 
that  a'i  ^  cti,  and  n°.  <  n°, ,  and  for  all  other  agents 
j  £  N,j  ^  i,  ctj  =  a’y  We  let  the  dispersion  relation  □  be 
the  reflexive  and  transitive  closure  qfD. 

In  other  words,  o  is  more  dispersed  than  o'  if  it  is  possible 
to  transform  o'  into  o  by  a  sequence  of  steps,  each  of  which 
is  a  change  of  action  by  exactly  one  agent  to  an  action  with 
fewer  other  agents.  It  is  important  to  note  that  the  dispersion 
ordering  is  a  structural  property  of  any  CACP  game.  The 
dispersion  relation  over  the  set  of  outcomes  forms  a  partially 
ordered  set  (poset).  Note  that  the  set  of  MDOs  is  just  the  set 
of  □-maximal  elements  of  O. 

There  are  many  other  measures  that  we  could  use  instead 
of  the  qualitative  dispersion  relation.  Entropy  is  consistent 
with,  but  stronger  than  our  dispersion  relation:  if  o  □  o'  then 
the  entropy  of  o  is  higher  than  that  of  o',  but  the  converse  is 
not  necessarily  true.  We  have  chosen  to  base  our  definitions 
on  the  weaker  dispersion  relation  because  it  is  the  most  gen¬ 
eral,  and  because  it  corresponds  directly  to  a  single  agent’s 
change  of  actions. 

Using  this  dispersion  relation,  we  can  state  the  formal  def¬ 
inition  of  strong  DGs. 

Definition  7  (Strong  DG)  A  CACP  game  G  =  {N,  A,  f) 
is  a  strong  dispersion  game  iff  for  all  outcomes  o,  o'  £  O,  it 
is  the  case  that  if  o  □  o'  but  not  o'  □  o,  then  o  A  o'  but  not 
o'  y  o. 

Recall  that  the  preference  relation  A  forms  a  total  order¬ 
ing  while  the  dispersion  relation  □  forms  a  partial  ordering. 
Thus  this  definition  requires  that  o  is  strictly  preferred  to  o' 
when  o  is  strictly  more  dispersed  than  o'. 

If  the  strong  definition  has  such  nice  properties,  why 
bother  to  state  the  weak  definition  at  all?  There  are  many 
situations  which  have  a  dispersion  quality  but  which  can¬ 
not  be  modeled  by  games  in  the  stronger  class.  Consider 
the  situation  faced  by  Alice,  Bob,  and  Charlie  who  are  each 
choosing  among  three  possible  roles  in  the  founding  of  a 
company:  CEO,  COO,  and  CFO.  Because  they  will  be  com¬ 
pensated  as  a  group,  the  situation  can  be  modeled  as  a  CP 
game.  However,  suppose  that  Bob  would  be  a  terrible  CEO. 
Clearly,  the  agents  would  most  prefer  an  outcome  in  which 
each  role  is  filled  and  Bob  is  not  CEO;  thus  the  game  satis¬ 
fies  the  weak  definition.  However,  rather  than  have  all  roles 
filled  and  Bob  alone  be  CEO,  they  would  prefer  an  outcome 
in  which  Bob  shares  the  CEO  position  with  one  of  the  other 
agents  (i.e.,  both  Bob  and  another  agent  select  the  “CEO” 
action),  even  though  it  leaves  one  of  the  other  roles  empty. 
In  other  words,  the  preference  relation  conflicts  with  the  dis¬ 
persion  ordering,  and  the  game  does  not  satisfy  the  strong 
definition. 


all  MDOs  are  maximal  outcomes.  In  fact,  it  is  easy  to  verify  that 
this  must  be  the  case  in  a  fully  symmetric  DG. 


Non-Common-Preference  Dispersion  Games 

There  are  also  several  interesting  classes  of  non-CP  disper¬ 
sion  games  we  might  like  to  model.  Due  to  space  consider¬ 
ations  we  will  not  define  these  classes  formally,  but  instead 
present  a  few  motivating  examples. 

Consider  again  the  load  balancing  application  in  which 
each  of  n  users  simultaneously  wishes  to  use  one  of  k  dif¬ 
ferent  resources.  If  the  users  all  belong  to  a  single  organi¬ 
zation,  the  interest  of  the  organization  can  be  well  modeled 
by  a  CP  DG,  since  the  productivity  of  the  organization  will 
be  highest  if  the  users  are  as  dispersed  as  possible  among 
the  servers.  However,  the  users’  preferences  may  be  more 
selfish :  a  user  may  prefer  individually  to  use  a  resource  with 
the  fewest  possible  other  users,  regardless  of  the  welfare  of 
the  rest  of  the  group.  Additionally,  users’  preferences  may 
reflect  some  combination  of  individual  and  group  welfare. 
These  problems  may  be  modeled  with  the  class  of  selfish 
dispersion  games. 

Consider  again  the  niche  selection  problem,  in  which  each 
of  n  oligopoly  producers  wishes  to  occupy  one  of  k  different 
market  niches.  It  may  be  the  case  that  in  addition  to  a  general 
preference  for  dispersal  (presumably  to  avoid  competition) 
each  producer  has  an  exogenous  preference  for  one  of  the 
niches;  these  preferences  may  or  may  not  be  aligned.  For 
example,  it  may  be  that  one  of  the  market  niches  is  larger 
and  thus  preferred  by  all  producers.  Alternatively,  a  pro¬ 
ducer  may  have  competencies  that  suit  it  well  for  a  partic¬ 
ular  niche.  Note  that  the  two  agent  case  can  be  modeled 
by  what  one  might  call  the  anti-battle-of -the- sexes  game  in 
which  a  man  and  his  ex-wife  both  wish  to  attend  one  of  two 
parties,  one  of  which  is  more  desirable,  but  both  prefer  not 
to  encounter  each  other  (the  reader  familiar  with  the  origi¬ 
nal  BoS  game  will  appreciate  the  humor).  These  problems 
can  be  modeled  with  the  class  of  partial  dispersion  games, 
in  which  agents’  preferences  may  align  with  either  the  dis¬ 
persion  ordering  or  with  a  set  of  exogenous  preferences. 

Learning  Strategy  Definitions 

Now  that  we  have  defined  a  few  interesting  classes  of  dis¬ 
persion  games,  let  us  consider  the  task  of  playing  them  in  a 
repeated  game  setting.  There  are  two  perspectives  we  may 
adopt:  that  of  the  individual  agent  wishing  to  maximize  his 
individual  welfare,  and  that  of  a  system  designer  wishing  to 
implement  a  distributed  algorithm  for  maximizing  the  group 
welfare.  In  the  present  research,  we  adopt  the  latter. 

Let  us  begin  with  the  problem  of  finding  an  MDO  as 
quickly  as  possible  in  a  weak  CACP  DG.4  Note  that  this 
problem  is  trivial  if  implemented  as  a  centralized  algorithm. 
The  problem  is  also  trivial  if  implemented  as  a  distributed 
algorithm  in  which  agents  are  allowed  unlimited  commu¬ 
nication.  Thus  we  seek  distributed  algorithms  that  require 
no  explicit  communication  between  agents.  Each  algorithm 
takes  the  form  of  a  set  of  identical  learning  rules  for  each 

4Note  that  any  mixed  strategy  equilibrium  outcome  is  neces¬ 
sarily  preference  dominated  by  the  pure  strategy  MDOs.  For  this 
reason,  we  henceforth  disregard  mixed  strategy  equilibria,  and  fo¬ 
cus  on  the  problem  of  finding  one  of  the  MDOs. 
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agent,  each  of  which  is  a  function  mapping  observed  histo¬ 
ries  to  distributions  over  actions. 

Consider  the  most  naive  distributed  algorithm.  In  each 
round,  each  agent  selects  an  action  randomly  from  the  uni¬ 
form  distribution,  stopping  only  when  the  outcome  is  an 
MDO.  Note  that  this  naive  learning  rule  imposes  very  min¬ 
imal  information  requirements  on  the  agents:  each  agent 
must  be  informed  only  whether  the  outcome  is  an  MDO. 
Unfortunately,  the  expected  number  of  rounds  until  conver¬ 
gence  to  an  MDO  is 

kn 

MDO{n,  k) ' 

It  is  easy  to  see  that  for  k  =  n  the  expected  time  is  n"/n!, 
which  is  exponential  in  n. 

We  began  by  evaluating  traditional  learning  rules  from 
game  theory  and  artificial  intelligence.  Game  theory  offers 
a  plethora  of  options;  we  looked  for  simplicity  and  intuitive 
appropriateness.  We  considered  both  fictitious  play  (Brown 
1951;  Robinson  1951)  and  rational  learning  (Kalai  &  Lehrer 
1993).  Rational  learning  did  not  seem  promising  because  of 
its  dependence  on  the  strategy  space  and  initial  beliefs  of  the 
agents.  Thus  we  focused  our  attention  on  fictitious  play. 

In  evaluating  learning  rules  from  artificial  intelligence  the 
decision  was  more  straightforward.  Recently  there  has  been 
significant  interest  in  the  application  of  reinforcement  learn¬ 
ing  to  the  problem  of  multi-agent  system  learning  (Littman 
1994;  Claus  &  Boutilier  1998;  Brafman  &  Tennenholtz 
2000).  We  chose  to  implement  and  test  the  most  common 
reinforcement  learning  algorithm:  Q-learning. 

Finally,  we  developed  a  few  special  purpose  strategies  to 
take  advantage  of  the  special  structure  of  DGs. 

Note  that  the  different  strategies  we  describe  require 
agents  to  have  access  to  different  amounts  of  information 
about  the  outcome  of  each  round  as  they  play  the  game.  At 
one  extreme,  agents  might  need  only  a  Boolean  value  sig¬ 
nifying  whether  or  not  the  group  has  reached  an  MDO  (  this 
is  all  that  is  required  for  the  naive  strategy).  At  the  other 
extreme,  agents  might  need  complete  information  about  the 
outcome,  including  the  action  choices  of  each  of  the  other 
agents. 

Fictitious  Play  Learning 

Fictitious  play  is  a  learning  rule  in  which  an  agent  assumes 
that  each  other  agent  is  playing  a  fixed  mixed  strategy.  The 
fictitious  play  agent  uses  counts  of  the  actions  selected  by 
the  other  agents  to  estimate  their  mixed  strategies  and  then 
at  each  round  selects  the  action  that  has  the  highest  expected 
value  given  these  beliefs.  Note  that  the  fictitious  play  rule 
places  very  high  information  requirements  on  the  agents.  In 
order  to  update  their  beliefs,  agents  must  have  full  knowl¬ 
edge  of  the  outcome.  Our  implementation  of  fictitious  play 
includes  a  few  minor  modifications  to  the  basic  rule. 

One  modification  stems  from  the  well  known  fact  that 
agents  using  fictitious  play  may  never  converge  to  equilib¬ 
rium  play.  Indeed  our  experiments  show  that  fictitious  play 
agents  in  CP  DGs  often  generate  play  that  oscillates  within 
sets  of  outcomes,  never  reaching  an  MDO.  This  results  from 
the  agents’  erroneous  belief  in  the  others’  use  of  a  fixed 


mixed  strategy.  To  avoid  this  oscillation,  we  modify  the 
fictitious  play  rule  with  stochastic  perturbations  of  agents’ 
beliefs  as  suggested  by  (Fudenberg  &  Levine  1998).  In  par¬ 
ticular,  we  apply  a  uniform  random  variation  of  -1%  to  1% 
on  the  expected  reward  of  each  action  before  selecting  the 
agent’s  best  response. 

The  other  modifications  were  necessary  to  make  the 
agents’  computation  within  each  round  tractable  for  large 
numbers  of  agents.  Calculating  the  expected  value  of  each 
possible  action  at  each  round  requires  time  that  is  exponen¬ 
tial  in  n.  To  avoid  this,  we  store  the  history  of  play  as  counts 
of  observed  outcomes  rather  than  counts  of  each  agents’  ac¬ 
tions.  Also,  instead  of  maintaining  the  entire  history  of  play, 
we  use  a  bounded  memory  of  observed  outcomes.  The  pre¬ 
dicted  joint  mixed  strategy  of  the  other  agents  is  then  cal¬ 
culated  by  assuming  the  observed  outcomes  within  memory 
are  an  unbiased  sample.  5 

Reinforcement  Learning 

Reinforcement  learning  is  a  learning  rule  in  which  agents 
learn  a  mapping  from  states  to  actions  (Kaelbling,  Littman, 
&  Moore  1996).  We  implemented  the  Q-learning  algorithm 
with  a  Boltzman  exploration  policy.  In  Q-learning,  agents 
learn  the  expected  reward  of  performing  an  action  in  a  given 
state.  Our  implementation  of  Q-learning  includes  a  few  mi¬ 
nor  modifications  to  the  basic  algorithm. 

It  is  well  known  that  the  performance  of  Q-learning  is 
extremely  sensitive  to  a  number  of  implementation  details. 
First,  the  choice  of  a  state  space  for  the  agent’s  Q-function 
is  critical.  We  chose  to  use  only  a  single  state,  so  that  in 
effect  agents  learn  Q-values  over  actions  only.  Second,  the 
selection  of  initial  Q-values  and  temperature  is  critical.  We 
found  it  best  to  set  the  initial  Q-values  to  lie  strictly  within 
the  range  of  the  highest  possible  payoff  (i.e.,  being  alone) 
and  the  next  highest  (i.e.,  being  with  one  other  agent).  We 
chose  to  parameterize  the  Boltzman  learning  function  with 
an  initial  low  temperature.  These  choices  allow  agents  that 
initially  choose  a  non-conflicting  action  to  have  high  proba¬ 
bility  of  continuing  to  play  this  action,  and  allow  those  that 
have  collided  with  other  agents  to  learn  eventually  the  true 
value  of  the  action  and  successively  choose  other  actions  un¬ 
til  they  find  an  action  that  does  not  conflict. 

In  our  implementation  we  chose  to  give  the  agents  a  self¬ 
ish  reward  instead  of  the  global  common-preference  reward. 
The  reward  is  a  function  of  the  number  of  other  agents  that 
choose  the  same  action,  not  of  the  degree  of  dispersion  of 
the  group  as  a  whole.  This  selfish  reward  has  the  advantage 
of  giving  the  agents  a  signal  that  is  more  closely  tied  to  the 
effects  of  their  actions,  while  still  being  maximal  for  each 
agent  when  the  agents  have  reached  an  MDO. 

Specialized  Strategies 

The  first  specialized  strategy  that  we  propose  is  the  freeze 
strategy.  In  the  freeze  strategy,  an  agent  chooses  actions 

5  The  reader  might  be  concerned  that  this  approximation 
changes  the  convergence  properties  of  the  rule.  Although  this  may 
be  the  case  in  some  settings,  in  our  experiments  with  small  n  no 
difference  was  observed  from  those  using  the  full  history. 
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randomly  until  the  first  time  she  is  alone,  at  which  point 
she  continues  to  replay  that  action  indefinitely,  regardless 
of  whether  other  agents  choose  the  same  action.  It  is  easy  to 
see  that  this  strategy  is  guaranteed  to  converge  in  the  limit, 
and  that  if  it  converges  it  will  converge  to  an  MDO.  The 
freeze  strategy  also  has  the  benefit  of  imposing  very  mini¬ 
mal  information  requirements:  it  requires  an  agent  to  know 
only  how  many  agents  chose  the  same  action  as  she  did  in 
the  previous  round. 

An  improvement  on  the  freeze  strategy  is  the  basic  simple 
strategy ,  which  was  originally  suggested  by  Alpern  (2001). 
In  this  strategy,  each  agent  begins  by  randomly  choosing  an 
action.  Then,  if  no  other  agent  chose  the  same  action,  she 
chooses  the  same  action  in  the  next  round.  Otherwise,  she 
randomizes  over  the  set  of  actions  that  were  either  unoccu¬ 
pied  or  selected  by  two  or  more  agents.  Note  that  the  basic 
simple  strategy  requires  that  agents  know  only  which  actions 
had  a  single  agent  in  them  after  each  round. 

Definition  8  (Basic  Simple  Strategy)  Given  an  outcome 
o  £  O,  an  agent  using  the  basic  simple  strategy  will 

•  lfn°a  =  1,  select  action  a  with  probability  1, 

•  Otherwise,  select  an  action  from  the  uniform  distribution 

over  actions  a'  £  A  for  which  n°,  1. 

We  have  extended  the  basic  simple  strategy  to  work  in  the 
broader  class  of  games  for  which  nf^k. 

Definition  9  (Extended  Simple  Strategy)  Given  an  out¬ 
come  o  £  O,  an  agent  using  the  extended  simple  strategy 
will 

•  I.fna  —  Vn/k\’  select  action  a  with  probability  1, 

•  Otherwise,  select  action  a  with  probability  and  with 

probability  (1  —  randomize  over  the  actions  a'  for 

which  n°,  <  \n/k]. 

Unlike  the  basic  strategy,  the  extended  strategy  does  not 
assign  uniform  probabilities  to  all  actions  that  were  not  cho¬ 
sen  by  the  correct  number  of  agents.  Consider  agents  react¬ 
ing  to  the  outcome  {2,  2,  0,  0}.  In  this  case  each  agent  is 
better  off  staying  with  probability  0.5  and  jumping  to  each 
of  the  empty  slots  with  probability  0.25,  than  randomizing 
uniformly  over  all  four  slots.  The  extended  simple  strategy 
can  actually  be  further  improved  by  assigning  non-uniform 
probabilities  to  the  actions  a'  for  which  n°,  <  \n/k].  We 
have  found  empirically  that  the  learning  rule  converges  more 
rapidly  when  agents  place  more  probability  on  the  actions 
that  have  fewer  other  agents  in  them.  Note  that  the  ex¬ 
tended  simple  strategy  requires  that  agents  know  the  number 
of  agents  selecting  each  action  in  the  round;  the  identity  of 
these  agents  is  not  required,  however. 

Experimental  Results 

The  learning  rules  and  strategies  described  above  differ  sig¬ 
nificantly  in  the  empirical  time  to  converge.  In  Figure  2  we 
plot  as  a  function  of  n  the  convergence  time  of  the  learn¬ 
ing  rules  in  repeated  symmetric  weak  DGs,  averaged  over 
1000  trials.  Table  1  summarizes  the  observed  performance 
of  each  strategy  (as  well  as  the  information  requirements  of 


Figure  2;  Log-log  plot  of  the  empirical  performance  of  dif¬ 
ferent  strategies  in  symmetric  CACP  dispersion  games. 
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Rule 

Information 

Requirements 

Avg.  Rounds  to 
Converge  (/(n)) 

Naive 

Whether  MDO 

EXP 

FP 

Full  Information 

EXP 

RL 

Num.  in  Own  Action 

POLY 

Freeze 

Num.  in  Own  Action 

LINEAR 

BS  &ES 

Num.  in  All  Actions 

LOG 

Table  1:  Applicability  of  strategies  to  various  classes  of 
games  with  information  requirements  and  estimated  com¬ 
plexity  class. 


each  strategy).  We  discuss  the  performance  of  each  of  the 
strategies  in  turn. 

We  begin  with  the  learning  rules.  In  our  empirical  tests 
we  found  that  stochastic  fictitious  play  always  converged  to 
an  MDO.  However,  the  number  of  rounds  to  converge  was 
on  average  exponential  in  n.  In  our  empirical  tests  of  the 
reinforcement  learning  strategy  we  found  that  on  average 
play  converges  to  an  MDO  in  a  number  of  rounds  that  is 
linear  in  n.  An  interesting  result  is  that  for  n  k,  the  al¬ 
gorithm  didn’t  converge  to  a  unique  selection  of  actions  for 
each  agent,  but  rapidly  adopted  a  set  of  mixed  strategies  for 
the  agents  resulting  in  average  payoffs  close  to  the  optimal 
deterministic  policy. 

The  specialized  strategies  generally  exhibited  better  per¬ 
formance  than  the  learning  rules.  Our  empirical  observa¬ 
tions  show  that  the  number  of  rounds  it  takes  for  the  freeze 
strategy  to  converge  to  an  MDO  is  linear  in  n.  Our  empirical 
tests  of  both  basic  and  extended  simple  strategies  show  that 
on  average,  play  converges  to  an  MDO  in  a  number  of  steps 
that  is  logarithmic  in  the  number  of  agents.6 


6For  n  >  k  certain  ratios  of  n/k  led  consistently  to  superlog- 
arithmic  performance;  slight  modifications  of  the  extended  simple 
strategy  were  able  to  achieve  logarithmic  performance. 
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Discussion 

In  this  paper  we  have  introduced  the  class  of  DGs  and  de¬ 
fined  several  important  subclasses  that  display  interesting 
properties.  We  then  investigated  certain  representative  learn¬ 
ing  rules  and  tested  their  empirical  behavior  in  DGs.  In  the 
future,  we  intend  to  continue  this  research  in  two  primary 
directions. 

First,  we  would  like  to  further  investigate  some  new  types 
of  DGs.  We  gave  examples  above  of  two  classes  of  non-CP 
dispersion  games  that  model  common  problems,  but  due  to 
space  limitations  we  were  not  able  to  define  and  characterize 
them  in  this  paper.  On  a  different  note,  we  are  also  interested 
in  a  possible  generalization  of  DGs  which  models  the  allo¬ 
cation  of  some  quantity  associated  with  the  agents,  such  as 
skill  or  usage,  to  the  different  actions.  We  would  like  to  de¬ 
fine  these  classes  of  games  formally,  and  explore  learning 
rules  that  can  solve  them  efficiently. 

Second,  we  would  like  to  continue  the  research  on  learn¬ 
ing  in  DGs  that  we  have  begun  in  this  paper.  The  learn¬ 
ing  rules  we  evaluated  above  are  an  initial  exploration,  and 
clearly  many  other  learning  techniques  also  deserve  consid¬ 
eration.  Additionally,  we  would  like  to  complement  the  em¬ 
pirical  work  presented  here  with  some  analytical  results.  As 
a  preliminary  result,  we  can  prove  the  following  loose  upper 
bound  on  the  expected  convergence  time  of  the  basic  simple 
strategy. 

Proposition  1  In  a  repeated  fully  symmetric  weak  disper¬ 
sion  game  with  n  agents  and  actions,  in  which  all  agents 
use  the  basic  simple  strategy,  the  expected  number  of  rounds 
until  convergence  to  an  MDO  is  in  0(n). 

Informally,  the  proof  is  as  follows.  The  probability  that  a 
particular  agent  chooses  an  action  alone  is  ((n  —  1  )/n)n~1, 
and  so  the  expected  number  of  rounds  until  she  is  alone  is 
just  (n/(n—  l))n_1.  Because  of  the  linearity  of  expectation, 
the  expected  number  of  rounds  for  all  agents  to  find  them¬ 
selves  alone  must  be  no  more  than  nra/(n  —  1)"  ,  which 

is  less  than  ne  for  all  n  >  1.  Using  similar  techniques  it  is 
possible  to  show  a  quadratic  bound  on  the  expected  conver¬ 
gence  time  of  the  freeze  strategy. 

Unfortunately,  our  empirical  results  show  that  the  basic 
simple  strategy  converges  in  time  that  is  logarithmic  in  n, 
and  that  the  freeze  strategy  converges  in  linear  time.  This 
gap  between  our  preliminary  analysis  and  our  empirical  re¬ 
sults  begs  future  analytical  work.  Is  it  possible  to  show  a 
tighter  upper  bound,  for  these  learning  rules  or  for  others? 
Can  we  show  a  lower  bound? 

We  would  also  like  to  better  understand  the  optimality  of 
learning  rules.  It  is  possible  in  principal  to  derive  the  opti¬ 
mal  reactive  learning  rule  for  any  finite  number  of  agents  us¬ 
ing  dynamic  programming.  Note  that  the  optimal  strategies 
obtained  using  this  method  are  arbitrarily  complex,  how¬ 
ever.  For  example,  even  upon  reaching  the  simple  out¬ 
come  {2, 2,  0,  0},  an  optimal  reactive  strategy  for  each  agent 
chooses  the  same  action  with  probability  0.5 118  (not  0.5,  as 
the  extended  simple  strategy  would  dictate). 

Dispersion  games  clearly  play  an  important  role  in  coop¬ 
erative  multiagent  systems,  and  deserve  much  more  discus¬ 
sion  and  scrutiny.  We  view  the  results  of  this  paper  as  open¬ 


ing  the  door  to  substantial  additional  work  on  this  exciting 
class  of  games. 
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Abstract 

We  present  two  simple  search  methods  for  computing  a  sam¬ 
ple  Nash  equilibrium  in  a  nonnal-fonn  game:  one  for  2- 
player  games  and  one  for  n-player  games.  We  test  these  al¬ 
gorithms  on  many  classes  of  games,  and  show  that  they  per¬ 
form  well  against  the  state  of  the  art-  the  Lemke-Howson  al¬ 
gorithm  for  2-player  games,  and  Simplicial  Subdivision  and 
Govindan-Wilson  for  n-player  games. 

Introduction 

Game  theory  has  had  a  profound  impact  on  multi-agent  sys¬ 
tems  research,  and  indeed  on  computer  science  in  general. 
Nash  equilibrium  (NE)  is  arguably  the  most  important  con¬ 
cept  in  game  theory,  and  yet  remarkably  little  is  known  about 
the  problem  of  computing  a  sample  NE  in  a  normal-form 
game.  All  evidence  points  to  this  being  a  hard  problem,  but 
its  precise  complexity  is  unknown  (Papadimitriou  2001). 

At  the  same  time,  several  algorithms  have  been  proposed 
over  the  years  for  the  problem.  In  this  paper,  three  previ¬ 
ous  algorithms  will  be  of  particular  interest.  For  2-player 
games,  the  Lemke-FIowson  algorithm  (Lemke  &  Howson 
1964)  is  still  the  state  of  the  art,  despite  being  40  years  old. 
For  ?i-player  games,  until  recently  the  algorithm  based  on 
Simplicial  Subdivision  (van  der  Laan,  Talman,  &  van  der 
Heyden  1987)  was  the  state  of  the  art.  Indeed,  these  two  al¬ 
gorithms  are  the  default  ones  implemented  in  Gambit  (McK- 
elvey,  McLennan,  &  Turocy  2003),  the  best-known  game 
theory  software.  Recently,  a  new  algorithm,  which  we  will 
refer  to  as  Govindan-Wilson,  was  introduced  by  (Govindan 
&  Wilson  2003)  and  extended  and  ef  ciently  implemented 
by  (Blum,  Shelton,  &  Roller  2003). 

In  a  long  version  of  this  paper  we  provide  more  intu¬ 
ition  behind  each  these  methods.  Flere  we  simply  note 
that  they  have  surfaced  as  the  most  competitive  algorithms 
for  the  respective  class  of  games,  and  refer  the  reader 
to  two  thorough  surveys  on  the  topic  (von  Stengel  2002; 
McKelvey  &  McLennan  1996).  Our  goal  in  this  paper  is 
to  demonstrate  that  for  both  of  these  classes  of  games  (2- 
player,  and  n-player  for  n  >  2)  there  exists  a  relatively 
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simple,  search-based  method  that  performs  very  well  in 
practice.  For  2-player  games,  our  algorithm  performs  sub¬ 
stantially  better  than  Lemke-Howson.  For  n-player  games, 
our  algorithm  outperforms  both  Simplicial  Subdivision  and 
Govindan-Wilson. 

The  basic  idea  behind  our  search  algorithms  is  simple. 
Recall  that,  while  the  general  problem  of  computing  a  NE  is 
a  complementarity  problem,  computing  whether  there  exists 
a  NE  with  a  particular  support 2  for  each  player  is  a  rela¬ 
tively  easy  feasibility  program.  Our  algorithms  explore  the 
space  of  support  pro  les  using  a  backtracking  procedure  to 
instantiate  the  support  for  each  player  separately.  After  each 
instantiation,  they  prune  the  search  space  by  checking  for 
actions  in  a  support  that  are  strictly  dominated,  given  that 
the  other  agents  will  only  play  actions  in  their  own  supports. 

Both  algorithms  order  the  search  by  giving  precedence  to 
supports  of  small  size.  Since  it  turns  out  that  games  drawn 
from  classes  that  researchers  have  focused  on  in  the  past  tend 
to  have  (at  least  one)  NE  with  a  very  small  support,  our  al¬ 
gorithms  are  often  able  to  nd  one  quickly.  Thus,  this  paper 
is  as  much  about  the  properties  of  NE  in  games  of  interest  as 
it  is  about  novel  algorithmic  insights. 

We  emphasize,  however,  that  we  are  not  cheating  in  the 
selection  of  games  on  which  we  test.  Past  algorithms  were 
tested  almost  exclusively  on  “random”  games.  We  tested  on 
these  too  (indeed,  we  will  have  more  to  say  about  how  “ran¬ 
dom”  games  vary  along  at  least  one  important  dimension), 
but  also  on  many  other  distributions  (24  in  total).  To  this 
end  we  use  GAMUT,  a  recently  introduced  computational 
testbed  for  game  theory  (Nudelman  et  al.  2004).  Our  results 
are  quite  robust  across  all  games  tested. 

The  rest  of  the  paper  is  organized  as  follows.  After  formu¬ 
lating  the  problem  and  the  basis  for  searching  over  supports, 
we  describe  our  two  algorithms.  The  n-player  algorithm  is 
essentially  a  generalization  of  the  2-player  algorithm,  but  we 
describe  them  separately,  both  because  they  differ  slightly 
in  the  ordering  of  the  search,  and  because  the  2-player  case 
admits  a  simpler  description  of  the  algorithm.  Then,  we  de¬ 
scribe  our  experimental  setup,  and  separately  present  our  re¬ 
sults  for  2-player  and  n-player  games.  In  the  nal  section, 
we  conclude  and  describe  opportunities  for  future  work. 


2The  support  speci  es  the  pure  strategies  played  with  nonzero 
probability. 
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Notation 

We  consider  nite,  n-player,  normal-form  games  G  = 

(N,  ( Ai),(ui )): 

•  N  =  {1, . . . ,  n}  is  the  set  of  players. 

•  A,  =  {an, . . .  ,aimi}  is  the  set  of  actions  available  to 
player  i,  where  rrii  is  the  number  of  available  actions 
for  that  player.  We  will  use  at  as  a  variable  that  takes 
on  the  value  of  a  particular  action  ay  of  player  i,  and 
a  =  (ai, . . . ,  an)  to  denote  a  pro  le  of  actions,  one  for 
each  player.  Also,  let  a_j  =  (a i, . . . ,  oq_i,  Oj+i, . . . ,  an) 
denote  this  same  pro  le  excluding  the  action  of  player  i, 
so  that  (a,  ,  a_i)  forms  a  complete  pro  le  of  actions.  We 
will  use  similar  notation  for  any  pro  le  that  contains  an 
element  for  each  player. 

•  Ui  :  A1  x  . . .  x  A.n  — >  5ft  is  the  utility  function  for  each 
player  i.  It  maps  a  pro  le  of  actions  to  a  value. 

Each  player  i  selects  a  mixed  strategy  from  the  set  Vi  = 
{Pi  ■  A  ->  [0,1]  |  EaiGAi-PiK)  =  !}•  A  mixed  strat¬ 
egy  for  a  player  speci  es  the  probability  distribution  used  to 
select  the  action  that  the  player  will  play  in  the  game.  We 
will  sometimes  use  a,  to  denote  the  pure  strategy  in  which 
Pi(af)  =  1-  The  support  of  a  mixed  strategy  pt  is  the  set 
of  all  actions  a,;  £  Ai  such  that  pfaf)  >  0.  We  will  use 
x  =  (xi, . . . ,  xn)  to  denote  a  pro  le  of  values  that  speci  es 
the  size  of  the  support  of  each  player. 

Because  agents  use  mixed  strategies,  u,  is  extended  to 
also  denote  the  expected  utility  for  player  i  for  a  strategy 
pro  le  p  =  (pi, . . .  ,pn ):  ufp)  =  E a6AP(a)«i(«).  where 
p{a)  =  nieA rpi{a.i). 

The  primary  solution  concept  for  a  normal  form  game  is 
that  of  Nash  equilibrium.  A  mixed  strategy  pro  le  is  a  Nash 
equilibrium  if  no  agent  has  incentive  to  unilaterally  deviate. 

De  nition  1  A  strategy  pro  le  p*  £  V  is  a  Nash  equilib¬ 
rium  if:  Vi  £  N,at  £  A  :  Ui{ai,p*_i)  <  Ui(p*,p*_f) 

Every  nite,  normal  form  game  is  guaranteed  to  have  at 
least  one  Nash  equilibrium  (Nash  1950). 

Searching  Over  Supports 

The  basis  of  our  two  algorithms  is  to  search  over  the  space 
of  possible  instantiations  of  the  support  S)  C  At  for  each 
player  i.  Given  a  support  pro  le  as  input,  Feasibility  Pro¬ 
gram  1,  below,  gives  the  formal  description  of  a  program  for 
nding  a  Nash  equilibrium  p  consistent  with  S  (if  such  an 
strategy  pro  le  exists).3  In  this  program,  vt  corresponds  to 
the  expected  utility  of  player  i  in  an  equilibrium.  The  rst 
two  classes  of  constraints  require  that  each  player  must  be 
indifferent  between  all  actions  within  his  support,  and  must 
not  strictly  prefer  an  action  outside  of  his  support.  These 
imply  that  no  player  can  deviate  to  a  pure  strategy  that  im¬ 
proves  his  expected  utility,  which  is  exactly  the  condition  for 
the  strategy  pro  le  to  be  a  Nash  equilibrium. 

3We  note  that  the  use  of  Feasibility  Program  1  is  not  novel- 
it  was  used  by  (Dickhaut  &  Kaplan  1991)  in  an  algorithm  which 
enumerated  all  support  pro  les  in  order  to  nd  all  Nash  equilibria. 


Because  p(a-f)  =  Yl/^i  Pj(aj)-  this  program  is  linear  for 
n  =  2  and  nonlinear  for  all  n  >  2.  Note  that,  strictly  speak¬ 
ing,  we  do  not  require  that  each  action  a,;  £  Si  be  in  the 
support,  because  it  is  allowed  to  be  played  with  zero  prob¬ 
ability.  Flowever,  player  i  must  still  be  indifferent  between 
action  ai  and  each  other  action  a'  £  Si. 


Feasibility  Program  1 

Input :  S  =  {Si, . . . ,  Sn),  a  support  pro  le 

Output :  NE  p,  if  there  exists  both  a  strategy  pro  le  p 

(pi , . . .  ,pn )  and  a  value  pro  le  v  =  (vi, . . . ,  v„)  s.t.: 


Vi  G  N,  di  G  Si 

y  p(a-i)Ui(ai,a-i) 

a-iES-i 

Vi  G  N,CLi  £  Si  : 

p(a-i)u,i(ai,  a_i) 

a-iES-i 

Vi  G  TV  : 

y  Pi(ai)  =  i 

OiSSi 

Vi  G  iV,  di  G  Si 

pfaf)  >  0 

Vi  G  AT,  di  ^  Si 

Pi(cii)  =  0 

Algorithm  for  Two-Player  Games 

In  this  section  we  describe  Algorithm  1,  our  2-player  algo¬ 
rithm  for  searching  the  space  of  supports.  There  are  three 
keys  to  the  ef  cienc  y  of  this  algorithm.  The  rst  two  are 
the  factors  used  to  order  the  search  space.  Speci  cally  ,  Al¬ 
gorithm  1  considers  every  possible  support  size  pro  le  sep¬ 
arately,  favoring  support  sizes  that  are  balanced  and  small. 
The  motivation  behind  these  choices  comes  from  work  such 
as  (McLennan  &  Berg  2002),  which  analyzes  the  theoreti¬ 
cal  properties  of  the  NE  of  games  drawn  from  a  particular 
distribution.  Speci  cally  ,  for  n-player  games,  the  payoffs 
for  an  action  pro  le  are  determined  by  drawing  a  point  uni¬ 
formly  at  random  in  a  unit  sphere.  Under  this  distribution, 
for  n  =  2,  the  probability  that  there  exists  a  NE  consistent 
with  a  particular  support  pro  le  varies  inversely  with  the  size 
of  the  supports,  and  is  zero  for  unbalanced  support  pro  les. 

The  third  key  to  Algorithm  1  is  that  it  separately  instanti¬ 
ates  each  players’  support,  making  use  of  what  we  will  call 
“conditional  (strict)  dominance”  to  prune  the  search  space. 

De  nition  2  An  action  ai  €  Ai  is  conditionally  dominated, 
given  a  pro  le  of  sets  of  available  actions  R-i  C 
for  the  remaining  agents,  if  the  following  condition  holds: 

£  Ai  Va-i  £  R—i  .  Ui(ai,  a-i)  u_?;). 

The  preference  for  small  support  sizes  ampli  es  the  ad¬ 
vantages  of  checking  for  conditional  dominance.  For  ex¬ 
ample,  after  instantiating  a  support  of  size  two  for  the  rst 
player,  it  will  often  be  the  case  that  many  of  the  second 
player’s  actions  are  pruned,  because  only  two  inequalities 
must  hold  for  one  action  to  conditionally  dominate  another. 

Pseudo-code  for  Algorithm  1  is  given  below.  Note  that 
this  algorithm  is  complete,  because  it  considers  all  support 
size  pro  les,  and  because  it  only  prunes  actions  that  are 
strictly  dominated. 


95 


Algorithm  1 

for  all  support  size  pro  les  x  =  {xi,x2),  sorted  in  increas¬ 
ing  order  of,  rst,  \xi  —  x2\  and,  second,  (xi  +  x2)  do 
for  all  Si  C  A i  s.t.  |Si|  =  xi  do 

A'2  <—  {a2  €  A2  not  cond.  dominated,  given  Si  } 
if  $ai  G  Si  cond.  dominated,  given  A'2  then 
for  all  S2  C  A'2  s.t.  \S2\  =  x2  do 

if  $ai  G  Si  cond.  dominated,  given  S2  then 
if  Feasibility  Program  1  is  satis  able  for  S  = 
(Si,  S 2)  then 
Return  the  found  NE  p 


Algorithm  for  N-Player  Games 

Algorithm  1  can  be  interpreted  as  using  the  general  back¬ 
tracking  algorithm  (see,  e.g.,  (Dechter  2003))  to  solve  a  con¬ 
straint  satisfaction  problem  (CSP)  for  each  support  size  pro¬ 
le.  The  variables  in  each  CSP  are  the  supports  Si,  and 
the  domain  of  each  S,  is  the  set  of  supports  of  size  Xj. 
While  the  single  constraint  is  that  there  must  exist  a  solu¬ 
tion  to  Feasibility  Program  1,  an  extraneous,  but  easier  to 
check,  set  of  constraints  is  that  no  agent  plays  a  condition¬ 
ally  dominated  action.  The  removal  of  conditionally  domi¬ 
nated  strategies  by  Algorithm  1  is  similar  to  using  the  AC-1 
to  enforce  arc-consistency  with  respect  to  these  constraints. 
We  use  this  interpretation  to  generalize  Algorithm  1  for  the 
n-player  case.  Pseudo-code  for  Algorithm  2  and  its  two 
procedures,  Recursive-Backtracking  and  Iterated  Removal 
of  Strictly  Dominated  Strategies  (IRSDS)  are  given  below.4 

IRSDS  takes  as  input  a  domain  for  each  player’s  support. 
For  each  agent  whose  support  has  been  instantiated,  the  do¬ 
main  contains  only  that  instantiated  support,  while  for  each 
other  agent  i  it  contains  all  supports  of  size  Xi  that  were 
not  eliminated  in  a  previous  call  to  this  procedure.  On  each 
pass  of  the  repeat-until  loop,  every  action  found  in  at  least 
one  support  of  a  player’s  domain  is  checked  for  conditional 
domination.  If  a  domain  becomes  empty  after  the  removal 
of  a  conditionally  dominated  action,  then  the  current  instan¬ 
tiations  of  the  Recursive-Backtracking  are  inconsistent,  and 
IRSDS  returns  failure.  Because  the  removal  of  an  action  can 
lead  to  further  domain  reductions  for  other  agents,  IRSDS 
repeats  until  it  either  returns  failure  or  iterates  through  all 
actions  of  all  players  without  nding  a  dominated  action. 

Finally,  we  note  that  Algorithm  2  is  not  a  strict  general¬ 
ization  of  Algorithm  1,  because  it  orders  the  support  size 
pro  les  rst  by  size,  and  then  by  a  measure  of  balance.  The 
reason  for  the  change  is  that  balance  (while  still  signi  cant) 
is  less  important  for  n  >  2  than  it  is  for  n  =  2.  For  ex¬ 
ample,  under  the  model  of  (McLennan  &  Berg  2002),  for 
n  >  2,  the  probability  of  the  existence  of  a  NE  consistent 
with  a  particular  support  pro  le  is  no  longer  zero  when  the 
support  pro  le  is  unbalanced. 

4Even  though  our  implementation  of  the  backtracking  proce¬ 
dure  is  iterative,  for  simplicity  we  present  it  here  in  its  equivalent, 
recursive  form.  Also,  the  reader  familiar  with  CSPs  will  recognize 
that  we  have  employed  very  basic  algorithms  for  backtracking  and 
for  enforcing  arc  consistency,  and  we  return  to  this  point  in  the 
conclusion. 


Algorithm  2 

for  all  a:  =  (xi, . . .  ,xn),  sorted  in  increasing  order  of, 
rst,  J2i  xi  and,  second,  maxij(xi  —  Xj)  do 
Vi  :  Si  < —  NULL  It  uninstantiated  supports 

Mi  :  Di  {Si  C  Ai  :  |S;|  =  Xi}  I  I  domain  of  supports 

if  Recursive-Backtracking) S',  1) .  1)  returns  a  NE  p  then 
Return  p 


Procedure  1  Recursive -Backtracking 

Input :  S  =  (Si, . . . ,  S„):  a  pro  le  of  supports 
D  =  (Di, . . .  ,Dn):  a  pro  le  of  domains 
i:  index  of  next  support  to  instantiate 
Output :  A  Nash  equilibrium  p,  or  failure 

if  i  =  n  +  1  then 

if  Feasibility  Program  1  is  satis  able  for  S  then 
Return  the  found  NE  p 
else 

Return  failure 

else 

for  all  di  G  D,  do 

Si  ^  d  , 

Di  <-  Di  -  {di} 

if  IRSDS(({S!}, . . . ,  {Si},  Di+ 1, . . . ,  Dn))  succeeds 

then 

if  Recursive-Backtracking(S,  D,i  +  1)  returns  NE  p 
then 
Return  p 
Return  failure 


Procedure  2  Iterated  Removal  of  Strictly  Dominated 
Strategies  (IRSDS) 

Input  D  =  (D 1, . . . ,  Dn):  pro  le  of  domains 
Output.  Updated  domains,  or  failure 

repeat 

changed  <—  false 

for  all  i  G  N  do 

for  all  ai  £  di  £  Di  do 
for  all  a'i  G  A,  do 

if  Va_,;  G  d—i  G  D_i,  Ui{di,  a_,)  Ui{a.i,a—i ) 

then 

D,  < —  Di  —  {di  G  Di  :  ai  G  di} 
changed  <—  true 

if  Di  =  0  then 

return  failure 
until  changed  =  false 

return  D 


DI 

Bertrand  Oligopoly 

D2 

Bidirectional  LEG,  Complete  Graph 

D3 

Bidirectional  LEG,  Random  Graph 

D4 

Bidirectional  LEG,  Star  Graph 

D5 

Covariance  Game:  p  =  0.9 

D6 

Cov.  Game:  p  E  [— 1/(AT  —  1),  1] 

D7 

Covariance  Game:  p  =  0 

D8 

Dispersion  Game 

D9 

Graphical  Game,  Random  Graph 

D10 

Graphical  Game,  Road  Graph 

Dll 

Graphical  Game,  Star  Graph 

D12 

Graphical  Game,  Small-World 

D13 

Minimum  Effort  Game 

D14 

Polymatrix  Game,  Complete  Graph 

D15 

Polymatrix  Game,  Random  Graph 

D16 

Polymatrix  Game,  Road  Graph 

D17 

PolymatrixGame,  Small- World 

D18 

Uniformly  Random  Game 

D19 

Travelers  Dilemma 

D20 

Uniform  LEG,  Complete  Graph 

D21 

Uniform  LEG,  Random  Graph 

D22 

Uniform  LEG,  Star  Graph 

D23 

Location  Game 

D24 

War  Of  Attrition 

Table  1 :  Descriptions  of  GAMUT  distributions. 
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Experimental  Results 

To  evaluate  the  performance  of  our  algorithms  we  ran  sev¬ 
eral  sets  of  experiments.  All  games  were  generated  by 
GAMUT  (Nudelman  et  al.  2004),  a  test-suite  that  is  capable 
of  generating  games  from  a  wide  variety  of  classes  of  games 
found  in  the  literature.  Table  1  provides  a  brief  description 
of  the  subset  of  distributions  on  which  we  tested. 

A  distribution  of  particular  importance  is  the  one  most 
commonly  tested  on  in  previous  work:  D18,  the  “Uniformly 
Random  Game”,  in  which  every  payoff  in  the  game  is  drawn 
independently  from  an  identical  uniform  distribution.  Also 
important  are  distributions  D5,  D6,  and  D7,  which  fall  under 
a  “Covariance  Game”  model  studied  by  (Rinott  &  Scarsini 
2000),  in  which  the  payoffs  for  the  n  agents  for  each  action 
pro  le  are  drawn  from  a  multivariate  normal  distribution  in 
which  the  covariance  p  between  the  payoffs  of  each  pair  of 
agents  is  identical.  When  p  =  1,  the  game  is  common- 
payoff,  while  p  =  j-  yields  minimal  correlation,  which 
occurs  in  zero-sum  games.  Thus,  by  altering  p,  we  can 
smoothly  transition  between  these  two  extreme  classes  of 
games. 

Our  experiments  were  executed  on  a  cluster  of  12 
dual-processor,  2.4GHz  Pentium  machines,  running  Linux 
2.4.20.  We  capped  runs  for  all  algorithms  at  1800  seconds. 
When  describing  the  statistics  used  to  evaluate  the  algo¬ 
rithms,  we  will  use  “unconditional”  to  refer  to  the  value  of 
the  statistic  when  timeouts  are  counted  as  1800  seconds,  and 
“conditional”  to  refer  to  its  value  excluding  timeouts. 

When  n  =  2,  we  solved  Feasibility  Program  1  using 
CPLEX  8.0’s  callable  library.  For  n  >  2,  because  the  pro¬ 
gram  is  nonlinear,  we  instead  solved  each  instance  of  the 
program  by  executing  AMPL,  using  MINOS  as  the  underly¬ 
ing  optimization  package.  Obviously,  we  could  substitute  in 
any  nonlinear  solver;  and,  since  a  large  fraction  of  our  run¬ 
ning  time  is  spent  on  AMPL  and  MINOS,  doing  so  would 
greatly  affect  the  overall  running  time. 

Before  presenting  the  empirical  results,  we  note  that  a 
comparison  of  the  worst-case  running  times  of  our  two  algo¬ 
rithms  and  the  three  we  tested  against  does  not  distinguish 
between  them,  since  there  exist  inputs  for  each  which  lead 
to  exponential  time. 

Results  for  Two-Player  Games 

In  the  rst  set  of  experiments,  we  compared  the  performance 
of  Algorithm  1  to  that  of  Lemke-Howson  (implemented  in 
Gambit,  which  added  the  preprocessing  step  of  iterated  re¬ 
moval  of  weakly  dominated  strategies)  on  2-player,  300- 
action  games  drawn  from  24  of  GAMUT’S  2-player  distribu¬ 
tions.  Both  algorithms  were  executed  on  100  games  drawn 
from  each  distribution.  The  time  is  measured  in  seconds  and 
plotted  on  a  logarithmic  scale. 

Figure  1(a)  compares  the  unconditional  median  runtimes 
of  the  two  algorithms,  and  shows  that  Algorithm  1  performs 
better  on  all  distributions.5  However,  this  does  not  tell  the 
whole  story.  For  many  distributions,  it  simply  re  ects  the 

5  Obviously,  the  lines  connecting  data  points  across  distributions 
for  a  particular  algorithm  are  meaningless-  they  were  only  added 
to  make  the  graph  easier  to  read. 


fact  that  there  is  a  greater  than  50%  chance  that  the  distribu¬ 
tion  will  generate  a  game  with  a  pure  strategy  NE,  which  our 
algorithm  will  then  nd  quickly.  Two  other  important  statis¬ 
tics  are  the  percentage  of  instances  solved  (Figure  1(b)),  and 
the  average  runtime  conditional  on  solving  the  instance  (Fig¬ 
ure  1(c)).  Here,  we  see  that  Algorithm  1  completes  far  more 
instances  on  several  distributions,  and  solves  fewer  on  just  a 
single  distribution  (6  fewer,  on  D23).  Additionally,  even  on 
distributions  for  which  we  solve  far  more  games,  our  condi¬ 
tional  average  runtime  is  1  to  2  orders  of  magnitude  smaller. 

Clearly,  the  hardest  distribution  for  our  algorithm  is  D6, 
which  consists  of  “Covariance  Games”  in  which  the  co- 
variance  p  is  drawn  uniformly  at  random  from  the  range 
[—1,1].  In  fact,  neither  Algorithm  1  nor  Lemke-Howson 
solved  any  of  the  games  in  another  “Covariance  Game”  dis¬ 
tribution  in  which  p  =  —0.9,  and  these  results  were  omitted 
from  the  graphs,  because  the  conditional  average  is  unde- 
ned  for  these  results.  On  the  other  hand,  for  the  distribu¬ 
tion  “CovarianceGame-Pos”  (D5),  in  which  p  =  0.9,  both 
algorithms  perform  well. 

To  further  investigate  this  continuum,  we  sampled  300 
values  for  p  in  the  range  [—1,1],  with  heavier  sampling  in  the 
transition  region  and  at  zero.  For  each  such  game,  we  plot¬ 
ted  a  point  for  the  runtime  of  both  Algorithm  1  and  Lemke- 
Howson  in  Figure  1(d).6  The  theoretical  results  of  (Rinott  & 
Scarsini  2000)  suggest  that  the  games  with  lower  covariance 
should  be  more  difcult  for  Algorithm  1,  because  they  are 
less  likely  to  have  a  pure  strategy  Nash  equilibrium.  Never¬ 
theless,  it  is  interesting  to  note  the  sharpness  of  the  transi¬ 
tion  that  occurs  in  the  [—0.3, 0]  interval.  More  surprisingly, 
a  similarly  sharp  transition  also  occurs  for  Lemke-Howson, 
despite  the  fact  that  the  two  algorithms  operate  in  unrelated 
ways.  Finally,  it  is  important  to  note  that  the  transition  re¬ 
gion  for  Lemke-Howson  is  shifted  to  the  right  by  approxi¬ 
mately  0.3,  and  that,  on  instances  in  the  easy  region  for  both 
algorithms.  Algorithm  1  is  still  an  order  of  magnitude  faster. 

In  the  third  set  of  experiments  we  explore  the  scaling 
behavior  of  both  algorithms  on  the  “Uniformly  Random 
Game”  distribution  (D18),  as  the  number  of  actions  in¬ 
creases  from  100  to  1000.  For  each  multiple  of  100,  we 
generated  20  games.  Because  space  constraints  preclude  an 
analysis  similar  to  that  of  Figures  1(a)  through  1(c),  we  in¬ 
stead  plot  in  Figure  1(e)  the  unconditional  average  runtime 
over  20  instances  for  each  data  size,  with  a  timeout  counted 
as  1800s.  While  Lemke-Howson  failed  to  solve  any  game 
with  more  than  600  actions  and  timed  out  on  some  100- 
action  games,  Algorithm  1  solved  all  instances,  and,  without 
the  help  of  cutoff  times,  still  had  an  advantage  of  2  orders  of 
magnitude  at  1000  actions. 

Results  for  N-Player  Games 

In  the  next  set  of  experiments  we  compare  Algorithm  2  to 
Govindan- Wilson  and  Simplicial  Subdivision  (which  was 
implemented  in  Gambit,  and  thus  combined  with  iterated 
removal  of  weakly  dominated  strategies).  First,  to  compare 
performance  on  a  x  ed  problem  size  we  tested  on  6-player, 

6The  capped  instances  for  Algorithm  1  were  perturbed  slightly 
upward  on  the  graph  for  clarity. 
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Distribution  Distribution  Distribution 


(a)  Unconditional  median  runtime  (b)  Percentage  solved  (c)  Average  time  on  solved  instances 
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Figure  1:  Comparison  of  Algorithm  1  and  Lemke-Howson  on  2-player  games.  Sub  gures  (a)-(d)  are  for  300-action  games. 
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Figure  2:  Comparison  of  Algorithm  2,  Simplicial  Subdivision,  and  Govindan-Wilson.  Sub  gures  (a)-(d)  are  for  6-player, 
5-action  games. 
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5-action  games  drawn  from  22  of  GAMUT’S  n-player  dis¬ 
tributions.7  While  the  numbers  of  players  and  actions  ap¬ 
pear  small,  note  that  these  games  have  15625  outcomes  and 
93750  payoffs.  Once  again.  Figures  2(a),  2(b),  and  2(c) 
show  unconditional  median  runtime,  percentage  of  instances 
solved,  and  conditional  average  runtime,  respectively.  Al¬ 
gorithm  2  has  a  very  low  unconditional  median  runtime,  for 
the  same  reason  that  Algorithm  1  did  for  two-player  games, 
and  outperforms  both  other  algorithms  on  all  distributions. 
While  this  dominance  does  not  extend  to  the  other  two  met¬ 
rics,  the  comparison  still  favors  Algorithm  2. 

We  again  investigate  the  relationship  between  p  and  the 
hardness  of  games  under  the  “Covariance  Game”  model. 
For  general  /(-player  games,  minimal  correlation  under  this 
model  occurs  when  p  =  —  ^-j-.  Thus,  we  can  only  study 
the  range  [—0.2, 1]  for  6-player  games.  Figure  2(d)  shows 
the  results  for  6-player  5-action  games.  Algorithm  2,  over 
the  range  [—0.1, 0],  experiences  a  transition  in  hardness  that 
is  even  sharper  than  that  of  Algorithm  1.  Simplicial  Sub¬ 
division  also  undergoes  a  transition,  which  is  not  as  sharp, 
that  begins  at  a  much  larger  value  of  p  (around  0.4).  How- 
ever,  the  running  time  of  Govindan-Wilson  is  only  slightly 
affected  by  the  covariance,  as  it  neither  suffers  as  much  for 
small  values  of  p  nor  bene  ts  as  much  from  large  values. 

Finally,  Figures  2(e)  and  2(f)  compare  the  scaling  behav¬ 
ior  (in  terms  of  unconditional  average  runtimes)  of  the  three 
algorithms:  the  former  holds  the  number  of  players  constant 
at  6  and  varies  the  number  of  actions  from  3  to  8,  while  the 
latter  holds  the  number  of  actions  constant  at  5,  and  varies 
the  number  of  players  from  3  to  8.  In  both  experiments, 
both  Simplicial  Subdivision  and  Govindan-Wilson  solve  no 
instances  for  the  largest  two  sizes,  while  Algorithm  2  still 
nds  a  solution  for  most  games. 

Conclusion  and  Future  Work 

In  this  paper,  we  presented  two  algorithms  for  nding  a  sam¬ 
ple  Nash  equilibrium.  Both  use  backtracking  approaches 
(augmented  with  pruning)  to  search  the  space  of  support  pro¬ 
les,  favoring  supports  that  are  small  and  balanced.  Both 
also  outperform  the  current  state  of  the  art. 

The  most  dif  cult  games  we  encountered  came  from  the 
“Covariance  Game”  model,  as  the  covariance  approaches  its 
minimal  value,  and  this  is  a  natural  target  for  future  algo¬ 
rithm  development.  We  expect  these  games  to  be  hard  in 
general,  because,  empirically,  we  found  that  as  the  covari¬ 
ance  decreases,  the  number  of  equilibria  decreases,  and  the 
equilibria  that  do  exist  are  more  likely  to  have  support  sizes 
near  one  half  of  the  number  of  actions,  which  is  the  support 
size  with  the  largest  number  of  supports. 

One  direction  for  future  work  is  to  employ  more  sophis¬ 
ticated  CSP  techniques.  The  main  goal  of  this  paper  was 
to  show  that  our  general  search  method  performs  well  in 
practice,  and  there  are  many  other  CSP  search  and  infer¬ 
ence  strategies  which  may  improve  its  ef  cienc  y.  Another 
promising  direction  to  explore  is  local  search,  in  which  the 

7Two  distributions  from  the  tests  of  2-player  games  are  missing 
here,  due  to  the  fact  that  they  do  not  naturally  generalize  to  more 
than  2  players. 


state  space  is  the  set  of  all  possible  supports,  and  the  avail¬ 
able  moves  are  to  add  or  delete  an  action  from  the  support  of 
a  player.  While  the  fact  that  no  equilibrium  exists  for  a  par¬ 
ticular  support  does  not  give  any  guidance  as  to  which  neigh¬ 
boring  support  to  explore  next,  one  could  use  a  relaxation  of 
Feasibility  Program  1  that  penalizes  infeasibility  through  an 
objective  function.  More  generally,  our  results  show  that  AI 
techniques  can  be  successfully  applied  to  this  problem,  and 
we  have  only  scratched  the  surface  of  possibilities  along  this 
direction. 
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Abstract 

We  present  GAMUT1,  a  suite  of  game  generators  de¬ 
signed  for  testing  game-theoretic  algorithms.  We  explain 
why  such  a  generator  is  necessary,  offer  a  way  of  visual¬ 
izing  relationships  between  the  sets  of  games  supported  by 
GAMUT,  and  give  an  oven’iew  of  GAMUT’S  architecture. 
We  highlight  the  importance  of  using  comprehensive  test 
data  by  benchmarking  existing  algorithms.  We  show  sur¬ 
prisingly  large  variation  in  algorithm  performance  across 
different  sets  of  games  for  two  widely-studied  problems: 
computing  Nash  equilibria  and  multiagent  learning  in  re¬ 
peated  games.2 

1.  Introduction 

Researchers  in  multiagent  systems  have  become  increas¬ 
ingly  interested  in  game  theory  as  a  modeling  tool.  This 
has  led  to  growing  interest  in  computational  problems  as¬ 
sociated  with  game-theoretic  domains.  Two  such  problems 
are  computing  Nash  equilibria  and  learning  to  achieve  good 
payoffs  in  repeated  games.  It  is  often  difficult  to  offer  theo¬ 
retical  guarantees  about  such  algorithms’  performance:  the 
computational  complexity  of  many  algorithms  for  comput¬ 
ing  Nash  remains  an  interesting  open  problem  [14],  and 
there  is  rarely  anything  that  can  be  proven  about  the  sort  of 
performance  a  learning  algorithm  will  achieve  without  mak¬ 
ing  reference  to  the  game  it  will  play  or  the  opponents  it  will 
face.  For  these  sorts  of  reasons,  researchers  needing  to  eval¬ 
uate  algorithms  for  game-theoretic  problems  often  choose 
to  perform  empirical  tests. 

One  general  lesson  that  has  been  learned  by  researchers 
working  in  a  wide  variety  of  different  domains  is  that  an 
algorithm’s  performance  can  vary  substantially  across  dif¬ 
ferent  “reasonable”  distributions  of  problem  instances,  even 


1  Available  at  http :  /  /gamut  .Stanford .  edu 

2  This  work  was  supported  by  NSF  grant  IIS-0205633  and  DARPA 
grant  F30602-00-2-0598. 


when  problem  size  is  held  constant  [9] .  When  we  examine 
the  empirical  tests  that  have  been  performed  on  algorithms 
that  take  games  as  their  inputs,  we  find  that  they  have  typi¬ 
cally  been  small-scale  and  involved  very  particular  choices 
of  games.  Such  tests  can  be  appropriate  for  limited  proofs- 
of-concept,  but  cannot  say  much  about  an  algorithm’s  ex¬ 
pected  performance  in  new  domains.  For  this,  a  compre¬ 
hensive  body  of  test  data  is  required. 

It  is  not  obvious  that  a  library  of  games  should  be  diffi¬ 
cult  to  construct.  After  all,  games  (if  we  think  for  the  mo¬ 
ment  about  normal-form  representations)  are  simply  matri¬ 
ces  with  one  dimension  indexed  by  action  for  each  player, 
and  one  further  dimension  indexed  by  player.  We  can  thus 
generate  games  by  taking  the  number  of  players  and  of  ac¬ 
tions  for  each  player  as  parameters,  and  populate  the  corre¬ 
sponding  matrix  with  real  numbers  generated  uniformly  at 
random.  Is  anything  further  required? 

We  set  out  to  answer  this  question  by  studying  sets  of 
games  that  have  been  identified  as  interesting  by  computer 
scientists,  game  theorists,  economists,  political  scientists 
and  others  over  the  past  50  years.  Our  attempt  to  get  a 
sense  of  this  huge  literature  led  us  to  look  at  several  hun¬ 
dred  books  and  papers,  and  to  extract  one  or  more  sets  of 
games  from  more  than  a  hundred  sources.  To  our  surprise, 
we  discovered  two  things. 

First,  for  every  one  of  the  sets  of  games  that  we  encoun¬ 
tered,  the  technique  described  above  would  generate  a  game 
from  that  set  with  probability  zero.  More  formally,  all  of 
these  sets  are  non-generic  with  respect  to  the  uniform  sam¬ 
pling  procedure.  It  is  very  significant  to  find  that  an  un¬ 
biased  method  of  generating  games  has  only  an  infinitesi¬ 
mal  chance  of  generating  any  of  these  games  that  have  been 
considered  realistic  or  interesting.  Since  we  know  that  algo¬ 
rithm  performance  can  depend  heavily  on  the  choice  of  test 
data,  it  would  be  unreasonable  to  extrapolate  from  an  algo¬ 
rithm’s  performance  on  random  test  data  to  its  expected  per¬ 
formance  on  real-world  problems.  It  seems  that  test  data  for 
games  must  take  the  form  of  a  patchwork  of  generators  of 
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Figure  1.  GAMUT  Taxonomy  (Partial) 


different  sets  of  games. 

Second,  we  were  surprised  to  find  very  little  work  that 
aimed  to  understand,  taxonomize  or  even  enumerate  non¬ 
generic  games  in  a  holistic  or  integrative  way.  We  came 
across  work  on  understanding  generic  games  [7],  and  found 
a  complete  taxonomy  of  two-player  two-action  games  [15]. 
Otherwise,  work  that  we  encountered  tended  to  fall  into  one 
or  both  of  two  camps.  Some  work  aimed  to  describe  and 
characterize  particular  sets  of  games  that  were  proposed 
as  reasonable  models  of  real-world  strategic  situations  or 
that  presented  interesting  theoretical  problems.  Second,  re¬ 
searchers  proposed  novel  representations  of  games,  explic¬ 
itly  or  implicitly  identifying  sets  of  games  that  could  be 
specified  compactly  in  these  representations. 

In  this  paper  we  aim  to  fill  this  gap:  to  identify  interest¬ 
ing  sets  of  non-generic  games  comprehensively  and  with  as 
little  bias  as  possible.  In  the  next  section  we  describe  this 
effort,  highlighting  relationships  between  different  sets  of 
games  we  encountered  in  our  literature  search  and  describ¬ 
ing  issues  that  arose  in  the  identification  of  game  generation 
algorithms.  In  section  3  we  give  experimental  proof  that  a 
comprehensive  test  suite  is  required  for  the  evaluation  of 
game-theoretic  algorithms.  For  our  two  example  problems, 
computing  Nash  equilibria  and  learning  in  repeated  games. 


we  show  that  performance  for  different  algorithms  varies 
dramatically  across  different  sets  of  games  even  when  the 
size  of  the  game  is  held  constant,  and  that  performance  on 
random  games  can  be  a  bad  predictor  of  performance  on 
other  games.  Finally,  in  the  appendix,  we  briefly  describe 
GAMUT’S  architecture  and  implementation,  including  dis¬ 
cussion  of  how  new  games  may  easily  be  added. 

2.  GAMUT 

For  the  initial  version  of  GAMUT  we  considered  only 
games  whose  normal-form  representations  can  be  comfort¬ 
ably  stored  in  a  computer.  Note  that  this  restriction  does  not 
rule  out  games  that  are  presented  in  a  more  compact  rep¬ 
resentation  such  as  extensive  form  or  graphical  games;  it 
only  rules  out  large  examples  of  such  games.  It  also  rules 
out  games  with  infinite  numbers  of  agents  and/or  of  ac¬ 
tions  and  Bayesian  games.  We  make  no  requirement  that 
games  must  actually  be  stored  in  normal  form;  in  fact, 
GAMUT  supports  a  wide  array  of  representations  (see  the 
appendix).  Some  are  complete  (able  to  represent  any  game) 
while  other  incomplete  representations  support  only  certain 
sets  of  games.  We  will  say  that  a  given  representation  de¬ 
scribes  a  set  of  games  compactly  if  its  descriptions  of  games 
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in  the  set  are  exponentially  shorter  than  the  games’  descrip¬ 
tions  in  normal  form. 

In  total  we  identified  122  interesting  sets  of  games  in 
our  literature  search,  and  we  were  able  to  find  finite  time 
generative  procedures  for  7 1 .  These  generative  sets  ranged 
from  specific  two-by-two  matrix  games  with  little  variation 
(e.g..  Chicken)  to  broad  classes  extensible  in  both  number 
of  players  and  number  of  actions  (e.g.,  games  that  can  be 
encoded  compactly  in  the  Graphical  Game  representation). 


Arms  Race 
Battle  of  the  Sexes 
Bertrand  Oligopoly 
Bidirectional  LEG 
Chicken 

Collaboration  Game 
Compound  Game 
Congestion  Game 
Coordination  Game 
Cournot  Duopoly 
Covariant  Game 
Dispersion  Game 


Grab  the  Dollar 
Graphical  Game 
Greedy  Game 
Guess  2/3  Average 
Hawk  and  Dove 
Local- Effect  Game 
Location  Game 
Majority  Voting 
Matching  Pennies 
Minimum  Effort  Game 
N-Player  Chicken 
N-Player  Pris  Dilemma 


Polymatrix  Game 
Prisoner’s  Dilemma 
Random  Games 
Rapoport’s  Distribution 
Rock,  Paper,  Scissors 
Shapley's  Game 
Simple  Inspection  Game 
Traveler’s  Dilemma 
Uniform  LEG 
War  of  Attrition 
Zero  Sum  Game 


Table  1.  Game  Generators  in  GAMUT 


2.1.  The  Games 

To  try  to  understand  the  relationships  between  these  dif¬ 
ferent  sets  of  non-generic  games,  we  set  out  to  relate  them 
taxonomically.  We  settled  on  identifying  subset  relation¬ 
ships  between  the  different  sets  of  games.  Our  taxonomy 
is  too  large  to  show  in  full,  but  a  fragment  of  it  is  shown 
in  Figure  1.  To  illustrate  the  sort  of  information  that  can 
be  conveyed  by  this  figure,  we  can  see  that  all  Dispersion 
Games  [6]  are  Congestion  Games  [17]  and  that  all  Conges¬ 
tion  Games  have  pure-strategy  equilibria. 

Besides  providing  some  insight  into  the  breadth  of  gen¬ 
erators  included  in  GAMUT  and  the  relationships  between 
them,  our  taxonomy  also  serves  a  more  practical  purpose: 
allowing  the  quick  and  intuitive  selection  of  a  set  of  gener¬ 
ators.  If  GAMUT  is  directed  to  generate  a  game  from  a  set 
that  does  not  have  a  generator  (e.g.,  supermodular  games 
[13];  games  having  unique  equilibria)  it  chooses  uniformly 
at  random  among  the  generative  descendants  of  the  set  and 
then  generates  a  game  from  the  chosen  set.  GAMUT  also 
supports  generating  games  that  belong  to  multiple  intersect¬ 
ing  sets  (e.g.,  symmetric  games  having  pure-strategy  equi¬ 
libria);  in  this  case  GAMUT  chooses  uniformly  at  random 
among  the  generative  sets  that  are  descendants  of  all  the 
named  sets. 

The  data  we  collected  in  our  literature  search — including 
bibliographic  references,  pseudo-code  for  generating  games 
and  taxonomic  relationships  between  games — will  be  use¬ 
ful  to  some  researchers  in  its  own  right.  We  have  gathered 
this  information  into  a  database  which  is  publicly  avail¬ 
able  from  http  :  / / gamut .  Stanford .  edu  as  part  of 
the  GAMUT  release.  Besides  providing  more  information 
about  references  than  we  can  fit  into  a  conference-length  pa¬ 
per,  this  database  also  allows  users  to  navigate  according  to 
subset/superset  relationships  and  to  perform  searches. 

2.2.  The  Generators 

Roughly  speaking,  the  sets  of  games  that  we  enumer¬ 
ated  in  the  taxonomy  can  be  partitioned  into  two  classes,  re¬ 
flected  by  different  colored  nodes  in  Figure  1 .  For  some  sets 
we  were  able  to  come  up  with  an  efficient  algorithmic  pro¬ 
cedure  that  can,  in  finite  time,  produce  a  sample  game  from 


that  set,  and  that  has  the  ability  to  produce  any  game  from 
that  set.  We  call  such  sets  generative.  For  others,  we  could 
find  no  reasonable  procedure.  One  might  consider  a  rejec¬ 
tion  sampling  approach  that  would  generate  games  at  ran¬ 
dom  and  then  test  whether  they  belong  to  a  given  set  S. 
However,  if  S  is  non-generic — which  is  true  for  most  of  our 
sets,  as  discussed  above — such  a  procedure  would  fail  to 
produce  a  sample  game  in  any  finite  amount  of  time.  Thus, 
we  do  not  consider  such  procedures  as  generators. 

Cataloging  the  relationships  among  sets  of  games  and 
identifying  generators  prepared  us  for  our  next  task,  creat¬ 
ing  game  generators.  The  wrinkle  was  that  generative  al¬ 
gorithms  were  rarely  described  explicitly  in  the  literature. 
While  in  most  cases  coming  up  with  an  algorithm  was 
straightforward,  we  did  encounter  several  interesting  issues. 

Sometimes  an  author  defined  a  game  too  narrowly 
for  our  purposes.  Many  traditional  games  (e.g.,  Pris¬ 
oner’s  Dilemma)  are  often  defined  in  terms  of  precise 
payoffs.  Since  our  goal  was  to  construct  a  generator  ca¬ 
pable  of  producing  an  infinite  number  of  games  be¬ 
longing  to  the  same  set,  we  had  to  generalize  these 
games.  In  the  case  of  Prisoner’s  Dilemma,  we  can  gener¬ 
ate  any  game 

(  R,R  S,T  \ 
l  T,S  P,P  ) 

which  satisfies  T  >  R  >  P  >  S  and  R  >  (S  +  T)/ 2.  (The 
latter  condition  ensures  that  all  three  of  the  non-equilibrium 
outcomes  are  Pareto  optimal.)  Thus,  an  algorithm  for  gen¬ 
erating  an  instance  of  Prisoner’s  Dilemma  reduces  to  gener¬ 
ating  four  numbers  that  satisfy  the  given  constraints.  There 
is  one  subtlety  involved  with  this  approach  to  generalizing 
games.  It  is  a  well-known  fact  that  a  positive  affine  transfor¬ 
mation  of  payoffs  does  not  change  strategic  situation  mod¬ 
eled  by  the  game.  It  is  also  a  common  practice  to  normal¬ 
ize  payoffs  to  some  standard  range  before  reasoning  about 
games.  We  ensure  that  no  generator  ever  generates  instances 
that  differ  only  by  a  positive  affine  transformation  of  pay¬ 
offs. 

In  other  cases  the  definition  of  a  set  was  too  broad, 
and  thus  had  to  be  restricted.  In  many  cases,  this  could  be 
achieved  via  an  appropriate  parametrization.  An  interesting 
example  of  this  is  the  set  of  Polymatrix  Games  [5].  These 
are  n-player  games  with  a  very  special  payoff  structure:  ev- 
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ery  pair  of  agents  plays  a  (potentially  different)  2-player 
game  between  them,  and  each  agent’s  utility  is  the  sum  of 
all  of  his  payoffs.  The  caveat,  however,  is  that  the  agent 
must  play  the  same  action  in  all  of  his  two-player  games.  We 
realized  that  these  games,  though  originally  studied  for  their 
computational  properties,  could  be  generalized  and  used  es¬ 
sentially  as  a  compact  representation  for  games  in  which 
each  agent  only  plays  two-player  games  against  some  sub¬ 
set  of  the  other  agents.  This  led  to  a  natural  parametriza- 
tion  of  polymatrix  games  with  graphs.  Nodes  of  the  graph 
now  represent  agents,  and  edges  are  labeled  with  different 
2-player  games.3  Thus,  though  we  still  can  sample  from  the 
set  of  all  polymatrix  games  using  a  complete  graph,  we  are 
now  able  also  to  focus  on  more  specific  and,  thus,  even  more 
structured  subsets. 

Sometimes  we  encountered  purely  algorithmic  difficul¬ 
ties.  For  example,  in  order  to  implement  geometric  games 
[18]  we  needed  data  structures  capable  of  representing  and 
performing  operations  on  abstract  sets  (such  as  finding  in¬ 
tersection,  or  enumerating  subsets). 

In  some  cases  one  parameterized  generator  was  able  to 
generate  games  from  many  different  sets.  For  example,  we 
implemented  a  single  generator  based  on  work  by  Rapoport 
[15]  which  demonstrated  that  there  are  only  85  strategically 
different  2x2  games,  and  so  did  not  need  to  implement  gen¬ 
erators  for  individual  2x2  games  mentioned  in  the  litera¬ 
ture.  We  did  elect  to  create  separate  generators  for  several 
very  common  games  (e.g.,  Matching  Pennies;  Hawk  and 
Dove).  We  also  used  our  taxonomy  to  identify  similar  sets 
of  games,  and  either  implemented  them  with  the  same  gen¬ 
erator  or  allowed  their  separate  generators  to  benefit  from 
sharing  common  algorithms  and  data  structures.  In  the  end 
we  built  35  parameterized  generators  to  support  all  of  the 
generative  sets  in  our  taxonomy;  these  are  listed  in  Table  1 . 

The  process  of  writing  generators  presented  us  with  a 
nontrivial  software  engineering  task  in  creating  a  coherent 
and  easily-extensible  software  framework.  Once  the  frame¬ 
work  was  in  place,  incrementally  adding  new  generators  be¬ 
came  easy.  Some  of  these  implementation  details  are  de¬ 
scribed  in  the  Appendix. 

3.  Running  the  GAMUT 

At  the  beginning  of  this  paper  we  claimed  that  it  is  neces¬ 
sary  to  evaluate  game-theoretic  algorithms  on  a  wide  range 
of  distributions  before  empirical  claims  can  be  made  about 
the  algorithms’  strengths  and  weaknesses.  Of  course,  such 
a  claim  can  only  be  substantiated  after  a  test  suite  has  been 
constructed.  In  this  section  we  show  that  top  algorithms  for 


3  Note  that  this  is  a  strict  subset  of  graphical  games,  where  payoffs  for 
each  player  also  depend  only  on  the  actions  of  its  neighbors,  but  it  is 
not  assumed  that  payoffs  have  the  additive  decomposition. 


two  computational  problems  in  game  theory  do  indeed  ex¬ 
hibit  dramatic  variation  across  distributions,  implying  that 
small  performance  tests  would  be  unreliable. 

All  our  experiments  were  performed  using  a  cluster  of 
12  dual-CPU  2.4GHz  Xeon  machines  running  Linux  2.4.20, 
and  took  about  120  CPU-days  to  run.  We  capped  runs  for  all 
algorithms  at  30  minutes  (1800  seconds). 

3.1.  Computation  of  Nash  Equilibria 

One  of  the  most  interesting  computational  problems  in 
game  theory  is  computing  Nash  equilibria.  All  evidence 
suggests  that  this  is  a  hard  problem  (e.g.,  [4,  3]),  yet  the 
precise  complexity  class  into  which  the  problem  falls  is  un¬ 
known  [14],  In  this  section  we  use  GAMUT  to  evaluate 
three  algorithms’  empirical  properties  on  this  problem. 

3.1.1.  Experimental  Setup  The  best-known  game  theory 
software  package  is  Gambit  [12],  a  collection  of  state- 
of-the-art  algorithms.  For  two-player  games  the  Lemke- 
Howson  algorithm  [8]  is  best  and  is  used  by  default  in  Gam¬ 
bit.  For  n-player  games  Gambit  uses  an  algorithm  based  on 
Simplicial  Subdivision  [19].  In  both  cases.  Gambit  performs 
iterative  removal  of  dominated  strategies  as  a  preprocess¬ 
ing  step.  Govindan  and  Wilson  [5]  introduced  an  alterna¬ 
tive  algorithm  based  on  a  continuation  method.  We  use  a 
recent  optimized  implementation,  the  GameTracer  package 
[1].  This  work  also  included  speedups  for  the  Govindan- 
Wilson  algorithm  on  the  special  cases  of  compact  graphi¬ 
cal  games  and  MAIDs,  but  because  we  expanded  all  games 
to  their  full  normal  forms  Govindan- Wilson  did  not  bene¬ 
fit  from  these  extensions  in  our  experiments. 

One  factor  that  can  have  a  significant  effect  on  an  algo¬ 
rithm’s  runtime  is  the  size  of  its  input.  Since  our  goal  was 
to  investigate  the  extent  to  which  runtimes  vary  as  the  re¬ 
sult  of  differences  between  distributions,  we  studied  fixed- 
size  games.  To  make  sure  that  our  findings  were  not  artifacts 
of  any  particular  problem  size  we  compared  results  across 
several  fixed  problem  sizes.  We  ran  the  Lemke-Howson  al¬ 
gorithm  on  games  with  2  players,  150  actions  and  2  play¬ 
ers,  300  actions.  Because  Govindan- Wilson  is  very  simi¬ 
lar  to  Lemke-Howson  on  two-player  games  and  is  not  opti¬ 
mized  for  this  case  [1],  we  did  not  run  it  on  these  games.  We 
ran  Govindan- Wilson  and  Simplicial  Subdivision  on  games 
with  6  players,  5  actions  and  18  players,  2  actions.  For  each 
problem  size  and  distribution,  we  generated  100  games. 

Both  to  keep  our  machine-time  demands  manageable 
and  to  keep  the  graphs  in  this  paper  from  getting  too  clut¬ 
tered,  we  chose  not  to  use  all  of  the  GAMUT  generators. 
Instead,  we  chose  a  representative  slate  of  22  distributions 
from  GAMUT.  Some  of  our  generators  (e.g..  Graphical 
Games,  Polymatrix  games,  and  Local  Effect  Games-LEGs) 
are  parameterized  by  graph  structure;  we  split  these  into 
several  sub-distributions  based  on  the  kind  of  graph  used. 
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Suffixes  “-CG”,  “-RG”,  “-SG”,  “-SW”  and  “-Road”  indi¬ 
cate,  respectively,  complete,  random,  star-shaped,  small- 
world,  and  road-shaped  (see  [20])  graphs.  Another  distribu¬ 
tion  that  we  decided  to  split  was  the  Covariant  Game  distri¬ 
bution,  which  implements  the  random  game  model  of  [16]. 
In  this  distribution,  payoffs  for  each  outcome  are  generated 
from  a  multivariate  normal  distribution,  with  correlation  be¬ 
tween  all  pairs  of  players  held  at  some  constant  p.  With 
p  =  1  these  games  are  common-payoff,  while  p  = 
yields  minimum  correlation  and  leads  to  zero-sum  games 
in  the  two-player  case.  Rinott  and  Scarsini  show  that  the 
probability  of  the  existence  of  a  pure  strategy  Nash  equi¬ 
librium  in  these  games  varies  as  a  monotonic  function  of 
p,  which  makes  the  games  computationally  interesting.  For 
these  games,  suffixes  “-Pos”,  “-Zero”,  and  “-Rand”  indicate 
whether  p  was  held  at  0.9,  0,  or  drawn  uniformly  at  ran¬ 
dom  from  [^j,  1], 

Lemke-Howson,  Simplicial  Subdivision  and  Govindan- 
Wilson  are  all  very  complicated  path-following  numerical 
algorithms  that  offer  virtually  no  theoretical  guarantees. 
They  all  have  worst-case  running  times  that  are  at  least  ex¬ 
ponential,  but  it  is  not  known  whether  this  bound  is  tight. 
On  the  empirical  side,  very  little  previous  work  has  at¬ 
tempted  to  evaluate  these  algorithms.  The  best-known  em¬ 
pirical  results  [11,  21]  were  obtained  for  generic  games 
with  payoffs  drawn  independently  uniformly  at  random  (in 
GAMUT,  this  would  be  the  RandomGame  generator).  Our 
work  may  therefore  represent  the  first  systematic  attempt 
to  understand  the  empirical  behavior  of  these  algorithms  on 
non-generic  games. 

3.1.2.  Experimental  Results  Figure  2  shows  each  algo¬ 
rithm’s  performance  across  distributions  for  two  different 
input  sizes.  The  Y -axis  shows  CPU  time  measured  in  sec¬ 
onds  and  plotted  on  a  log  scale.  Column  height  indicates 
median  runtime  over  100  instances,  with  the  error  bars 
showing  the  25th  and  75th  percentiles.  The  most  impor¬ 
tant  thing  to  note  about  this  graph  is  that  each  algorithm 
exhibits  highly  variable  behavior  across  our  distributions. 
This  is  less  visible  for  the  Govindan- Wilson  algorithm  on 
18-player  games,  only  because  this  algorithm’s  runtime  ex¬ 
ceeds  our  cap  for  a  majority  of  the  problems.  However,  even 
on  this  dataset  the  error  bars  demonstrate  that  the  distribu¬ 
tion  of  runtimes  varies  substantially  with  the  distribution. 
Moreover,  for  all  three  algorithms,  we  observe  that  this  vari¬ 
ation  is  not  an  artifact  of  one  particular  problem  size. 

Figure  3  illustrates  runtime  differences  both  across  and 
among  distributions  for  6-player  5-action  games.  (Though 
we  do  not  have  space  to  show  them  here,  we  observed  qual¬ 
itatively  similar  results  for  different  input  sizes  and  for  the 
Lemke-Howson  algorithm.)  Each  dot  on  the  graph  corre¬ 
sponds  to  a  single  run  of  an  algorithm  on  a  game.  This  graph 
shows  that  the  distribution  of  algorithm  runtimes  varies  sub¬ 
stantially  from  one  distribution  to  another,  and  cannot  easily 


Figure  2.  Effect  of  Problem  Size  on  Solver 
Performance 


be  inferred  from  25th/50th/75th  quartile  figures  such  as  Fig¬ 
ure  2.  The  highly  similar  Simplicial  Subdivision  runtimes 
for  Traveler’s  Dilemma  and  Minimum  Effort  Games  are  ex¬ 
plained  by  the  fact  that  these  games  can  be  solved  by  it¬ 
erated  elimination  of  dominated  strategies — a  step  not  per¬ 
formed  by  the  GameTracer  implementation  of  Govindan- 
Wilson.  We  note  that  distributions  that  are  related  to  each 
other  in  our  taxonomy  (e.g.,  all  kinds  of  Graphical  Games, 
LEGs,  or  Polymatrix  Games)  usually  give  rise  to  similar — 
but  not  identical — algorithmic  behavior. 

Figure  3  makes  it  clear  that  algorithms’  runtimes  exhibit 
substantial  variation  and  that  algorithms  often  perform  very 
differently  on  the  same  distributions.  However,  this  figure 
makes  it  difficult  for  us  to  reach  conclusions  about  the  ex¬ 
tent  to  which  the  algorithms  are  correlated.  For  an  answer  to 
this  question,  we  turn  to  Figure  4.  Each  data  point  represents 
a  single  6-player,  5-action  game  instance,  with  the  X-axis 
representing  runtime  for  Simplicial  Subdivision  and  the  Y - 
axis  for  Govindan- Wilson.  Both  axes  use  a  log  scale.  This 
figure  shows  that  when  we  focus  on  instances  rather  than 
on  distributions,  these  algorithms  are  very  highly  uncorre¬ 
lated.  Simplicial  Subdivision  does  strictly  better  on  67.2% 
of  the  instances,  while  timing  out  on  24.7%.  Govindan  - 
Wilson  wins  on  24.7%  and  times  out  on  36.5%.  It  is  in¬ 
teresting  to  note  that  if  a  game  is  easy  for  Simplicial  Sub¬ 
division,  then  it  will  often  be  harder  for  Govindan- Wilson, 
but  in  general  neither  algorithm  dominates. 
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Figure  3.  Runtime  Distribution  for  6-player,  5- 
action  Games 


3.2.  Multiagent  Learning  in  Repeated  Games 

The  last  few  years  have  seen  a  surge  of  research  into  mul¬ 
tiagent  learning,  resulting  in  the  recent  proposal  of  several 
new  algorithms.  This  research  area  is  still  at  a  very  early 
stage,  particularly  with  respect  to  the  identification  of  the 
best  metrics  and  standards  of  performance  to  use  for  eval¬ 
uating  algorithms.  As  a  result,  we  do  not  claim  that  our  re¬ 
sults  demonstrate  anything  about  the  relative  merit  of  the  al¬ 
gorithms  we  study.  We  believe  it  is  clear,  however,  that  our 
results  show  that  these  algorithms’  performance  depends 
crucially  on  the  distributions  of  games  on  which  they  are 
run,  and  thus  that  GAMUT  will  be  a  useful  tool  for  re¬ 
searchers  in  the  multiagent  learning  community. 

3.2.1.  Experimental  Setup  We  used  three  learning  algo¬ 
rithms:  Minimax-Q  [10],  WoLF  [2],  and  a  version  of  the 
original  Q-learning  algorithm  for  single  agent  games  [22] 
modified  for  use  by  an  individual  player  in  a  repeated  game 
setting.  These  algorithms  have  received  much  study  in  re¬ 
cent  years;  they  each  have  very  different  performance  guar¬ 
antees,  strengths  and  weaknesses.  Single-agent  Q-learning 
assumes  away  the  multiagent  component,  and  thus  is  not 
guaranteed  to  converge  at  all  against  an  adaptive  opponent. 
Minimax-Q  plays  a  safety-level  strategy,  and  so  does  not 
necessarily  converge  to  a  best  response.  WoLF  is  a  variable- 
learning-rate  policy-hill-climbing  algorithm  that  is  designed 
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Figure  4.  Correlation,  6-player,  5-action. 


to  converge  to  a  best  response.  Previous  work  in  the  litera¬ 
ture  has  established  that  each  of  these  algorithms  is  very 
sensitive  to  its  parameter  settings  (e.g.,  learning  rate)  and 
that  the  best  parameter  settings  usually  vary  from  one  game 
to  the  next.  Since  it  is  infeasible  to  perform  per-game  pa¬ 
rameter  tuning  in  an  experiment  involving  tens  of  thousands 
of  games,  we  determined  parameter  values  that  reproduced 
previously -published  results  from  [10,  2,  22]  and  then  fixed 
these  parameters  for  all  experiments. 

In  our  experiments  we  chose  to  focus  on  a  set  of  13  dis¬ 
tributions.  As  before,  we  keep  game  sizes  constant,  this  time 
at  2  actions  and  2  players  for  each  game.  Although  it  would 
also  be  interesting  to  study  performance  in  larger  games,  we 
decided  to  focus  on  a  simpler  setting  in  which  it  would  be 
easier  to  understand  the  results  of  our  experiments.  For  each 
distribution  we  generated  100  game  instances.  For  each  in¬ 
stance  we  performed  nine  different  pairings  (each  possible 
pairing  of  the  three  algorithms,  including  self-pairings,  and 
in  the  case  of  non-self-pairings  also  allowing  each  algorithm 
to  play  once  as  player  1  and  once  as  player  2).  We  ran  the  al¬ 
gorithms  on  each  pairing  ten  times,  since  we  found  that  al¬ 
gorithm  performance  varied  based  on  the  outcomes  of  coin 
flips.  On  each  run,  we  repeated  the  game  100,000  times.  The 
first  90,000  rounds  allow  the  algorithms  to  settle  into  their 
long-run  behavior;  we  then  compute  each  algorithm’s  pay¬ 
off  for  each  game  as  its  average  payoff  over  the  following 
10,000  rounds.  We  did  this  to  approximate  the  offline  per¬ 
formance  of  the  learned  policy  and  to  minimize  the  effect 
of  relative  differences  in  the  algorithms’  learning  rates. 

3.2.2.  Experimental  Results  There  are  numerous  ways  in 
which  learning  algorithms  can  be  evaluated.  In  this  section 
we  focus  on  just  two  of  them.  A  more  comprehensive  set 
of  experiments  would  be  required  to  judge  the  relative  mer¬ 
its  of  algorithms,  but  this  smaller  set  of  experiments  is  suf¬ 
ficient  to  substantiate  our  claim  that  algorithm  performance 
varies  significantly  from  one  distribution  to  another. 
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Figure  6.  Median  Payoffs  as  Player  1 


Figure  5  compares  the  pairwise  performance  of  three  al¬ 
gorithms.  The  height  of  a  bar  along  the  Y -axis  indicates  the 
(normalized)  fraction  of  games  in  which  the  corresponding 
algorithm  received  a  weakly  greater  payoff  than  its  oppo¬ 
nent.  In  this  metric  we  ignore  the  magnitude  of  payoffs, 
since  in  general  they  are  incomparable  across  games.  The 
overall  conclusion  that  we  can  draw  from  this  graph  is  that 
there  is  great  variation  in  the  relative  performance  of  algo¬ 
rithms  across  distributions.  There  is  no  clear  “winner”;  even 
Minimax-Q,  which  is  usually  outperformed  by  WoLF,  man¬ 
ages  to  win  a  significant  fraction  of  games  across  many  dis¬ 
tributions,  and  dominates  it  on  Traveler’s  Dilemma.  WoLF 
and  single-agent  Q  come  within  10%  of  a  tie  most  of  the 
time — suggesting  that  these  algorithms  often  converge  to 
the  same  equilibria — but  their  performance  is  still  far  from 
consistent  across  different  distributions. 

Figure  6  compares  algorithms  using  a  different  metric. 
Here  the  Y -axis  indicates  the  average  payoff  for  an  algo¬ 
rithm  when  playing  as  player  1,  with  column  heights  indi¬ 
cating  the  median  and  error  bars  indicating  25th  and  75th 
percentiles.  Payoffs  are  normalized  to  fall  on  the  range 
[—1,1].  Despite  this  normalization,  it  is  difficult  to  make 
meaningful  comparisons  of  payoff  values  across  distribu¬ 
tions.  This  graph  is  interesting  because,  while  focusing  on 
relative  performance  rather  than  trying  to  identify  a  “win¬ 
ning”  algorithm,  it  demonstrates  again  that  the  algorithms’ 
performance  varies  substantially  along  the  GAMUT.  More¬ 
over,  this  metric  shows  Minimax-Q  to  be  much  more  com¬ 
petitive  than  was  suggested  by  Figure  5. 

4.  Conclusion 

In  this  paper  we  presented  GAMUT,  a  game  theory  test 
suite.  We  surveyed  hundreds  of  books  and  papers  to  compile 
a  comprehensive  database  of  structured  non-generic  games 
and  the  relationships  between  them.  We  built  a  highly  mod¬ 
ular  and  extensible  software  framework,  and  used  it  to  im¬ 
plement  generators  for  these  sets  of  games.  Finally,  we 
demonstrated  the  importance  of  comprehensive  test  data 
to  game-theoretic  algorithms  by  showing  how  performance 


depends  crucially  on  the  distribution  of  instances  on  which 
an  algorithm  is  run.  We  hope  that  GAMUT  will  become 
a  useful  tool  for  researchers  working  at  the  intersection  of 
game  theory  and  computer  science. 
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Appendix:  GAMUT  Implementation 

The  GAMUT  software  was  built  using  an  object-oriented 
framework  and  implemented  in  Java4.  Our  framework  con¬ 
sists  of  objects  in  four  basic  categories:  game  generators, 
graphs,  functions,  and  representations.  Our  main  design  ob¬ 
jective  was  to  make  it  as  easy  as  possible  for  end  users  to 
write  new  objects  of  any  of  the  four  kinds,  in  order  to  al¬ 
low  GAMUT  to  be  extended  to  support  new  sets  of  games 
and  representations. 

Currently,  GAMUT  contains  35  implementations  of 
Game  objects,  which  correspond  the  35  procedures  we 
identified  in  2.2.  They  are  listed  in  Table  1.  While  the  in¬ 
ternal  representations  and  algorithms  used  vary  depend¬ 
ing  on  the  set  of  games  being  generated,  all  of  them  must 
be  able  to  return  the  number  of  players,  the  number  of  ac¬ 
tions  for  each  player,  and  the  payoff  for  a  each  player  for 
any  action  profile.  Outputter  classes  then  encode  gener¬ 
ated  games  into  appropriate  representations. 

Many  of  our  generators  depend  on  random  graphs  (e.g.. 
Graphical  Games,  Local  Effect  Games,  Polymatrix  Games) 
and  functions  (e.g..  Arms  Race,  Bertrand  Oligopoly,  Con¬ 
gestion  Games).  Graph  and  Function  classes,  listed  in 
Table  2,  have  been  implemented  to  meet  these  needs  in  a 


4  See  http :  / /gamut .  Stanford .  edu  for  detailed  software  docu¬ 
mentation. 
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GAMUT  Graph  Classes: 


Barabasi-Albert  PLOD 
Complete  Graph 
N-Ary  Tree 
N-Dimensional  Grid 
N-Dimensional  Wrapped  Grid 


Power- Law  Out- Degree 
Random  Graph 
Ring  Graph 
Small  World  Graph 
Star  Graph 


GAMUT  Function  Classes: 


Exponential  Function  Polynomial  Function 

Log  Function  Table  Function 

Decreasing  Wrapper  Increasing  Polynomial 


GAMUT  Outputter  Classes: 


Complete  Representations  Incomplete  Representations 
Default  GAMUT  Payoff  List  Local-Effect  Form 

Extensive  Form  Two-Player  Readable  Matrix  Form 

Gambit  Normal  Form 
Game  Tracer  Normal  Form 
Graphical  Form 


Table  2.  GAMUT  Support  Classes 


modular  way.  As  with  games,  additional  classes  of  func¬ 
tions  and  graphs  can  be  easily  added. 

Outputter  classes  encapsulate  the  notion  of  represen¬ 
tation.  GAMUT  allows  for  representations  to  be  incom¬ 
plete  and  to  work  only  with  compatible  generators;  how¬ 
ever,  most  output  representations  work  with  all  game  gen¬ 
erators.  Table  2  lists  the  complete  and  incomplete  represen¬ 
tations  that  are  currently  supported  by  GAMUT. 

In  keeping  with  our  main  goal  of  easy  extensibility, 
GAMUT  also  implements  a  wide  range  of  support  classes 
that  encapsulate  common  tasks.  For  example,  GAMUT  uses 
a  powerful  parameter  handling  mechanism.  Users  who  want 
to  create  a  new  generator  can  specify  types,  ranges,  default 
values  and  help  strings  for  parameters.  Given  this  informa¬ 
tion,  user  help,  parsing,  and  even  randomization  will  all  be 
handled  automatically.  Since  a  large  (and  mundane)  part  of 
the  user’s  job  now  becomes  declarative,  it  is  easy  to  focus 
on  the  more  interesting  and  conceptual  task  of  implement¬ 
ing  the  actual  generative  algorithm. 

Other  support  utilities  offer  the  ability  to  convert 
games  into  fixed-point  arithmetic  and  to  normalize  pay¬ 
offs.  The  former,  besides  often  being  more  efficient, 
sometimes  makes  more  sense  game-theoretically:  the  no¬ 
tion  of  a  Nash  equilibrium  can  become  muddy  with  floating 
point,  since  imprecision  can  lead  to  equilibrium  instabil¬ 
ity.  As  mentioned  in  section  2.2,  games’  strategic  properties 
are  preserved  under  positive  affine  transformations.  Nor¬ 
malization  allows  payoff  magnitudes  to  be  compared  and 
can  avoid  machine  precision  problems. 
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ABSTRACT 

We  present  a  new  approach  to  representing  coalitional  games 
based  on  rules  that  describe  the  marginal  contributions  of 
the  agents.  This  representation  scheme  captures  character¬ 
istics  of  the  interactions  among  the  agents  in  a  natural  and 
concise  manner.  We  also  develop  efficient  algorithms  for  two 
of  the  most  important  solution  concepts,  the  Shapley  value 
and  the  core,  under  this  representation.  The  Shapley  value 
can  be  computed  in  time  linear  in  the  size  of  the  input.  The 
emptiness  of  the  core  can  be  determined  in  time  exponen¬ 
tial  only  in  the  treewidth  of  a  graphical  interpretation  of  our 
representation. 

Categories  and  Subject  Descriptors 

1.2.11  [Distributed  Artificial  Intelligence]:  Multiagent 
systems;  J.4  [Social  and  Behavioral  Sciences]:  Eco¬ 
nomics;  F.2  [Analysis  of  Algorithms  and  Problem  Com¬ 
plexity] 

General  Terms 

Algorithms,  Economics 

Keywords 

Coalitional  game  theory,  Representation,  Treewidth 

1.  INTRODUCTION 

Agents  can  often  benefit  by  coordinating  their  actions. 
Coalitional  games  capture  these  opportunities  of  coordina¬ 
tion  by  explicitly  modeling  the  ability  of  the  agents  to  take 
joint  actions  as  primitives.  As  an  abstraction,  coalitional 
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games  assign  a  payoff  to  each  group  of  agents  in  the  game. 
This  payoff  is  intended  to  reflect  the  payoff  the  group  of 
agents  can  secure  for  themselves  regardless  of  the  actions 
of  the  agents  not  in  the  group.  These  choices  of  primitives 
are  in  contrast  to  those  of  non-cooperative  games,  of  which 
agents  are  modeled  independently,  and  their  payoffs  depend 
critically  on  the  actions  chosen  by  the  other  agents. 

1.1  Coalitional  Games  and  E-Commerce 

Coalitional  games  have  appeared  in  the  context  of  e-com¬ 
merce.  In  [7],  Kleinberg  et  al.  use  coalitional  games  to  study 
recommendation  systems.  In  their  model,  each  individual 
knows  about  a  certain  set  of  items,  is  interested  in  learning 
about  all  items,  and  benefits  from  finding  out  about  them. 
The  payoffs  to  groups  of  agents  are  the  total  number  of  dis¬ 
tinct  items  known  by  its  members.  Given  this  coalitional 
game  setting,  Kleinberg  et  al.  compute  the  value  of  the  pri¬ 
vate  information  of  the  agents  is  worth  to  the  system  using 
the  solution  concept  of  the  Shapley  value  (definition  can  be 
found  in  section  2).  These  values  can  then  be  used  to  deter¬ 
mine  how  much  each  agent  should  receive  for  participating 
in  the  system. 

As  another  example,  consider  the  economics  behind  sup¬ 
ply  chain  formation.  The  increased  use  of  the  Internet  as  a 
medium  for  conducting  business  has  decreased  the  costs  for 
companies  to  coordinate  their  actions,  and  therefore  coali¬ 
tional  game  is  a  good  model  for  studying  the  supply  chain 
problem.  Suppose  that  each  manufacturer  purchases  his  raw 
materials  from  some  set  of  suppliers,  and  that  the  suppliers 
offer  higher  discount  with  more  purchases.  The  decrease  in 
communication  costs  will  let  manufacturers  find  others  in¬ 
terested  in  the  same  set  of  suppliers  cheaper,  and  facilitates 
formation  of  coalitions  to  bargain  with  the  suppliers.  De¬ 
pending  on  the  set  of  suppliers  and  how  much  from  each 
supplier  each  coalition  purchases,  we  can  assign  payoffs  to 
the  coalitions  depending  on  the  discount  it  receives.  The 
resulting  game  can  be  analyzed  using  coalitional  game  the¬ 
ory,  and  we  can  answer  questions  such  as  the  stability  of 
coalitions,  and  how  to  fairly  divide  the  benefits  among  the 
participating  manufacturers.  A  similar  problem,  combina¬ 
torial  coalition  formation,  has  previously  been  studied  in  [8]. 

1.2  Evaluation  Criteria  for  Coalitional  Game 
Representation 

To  capture  the  coalitional  games  described  above  and  per¬ 
form  computations  on  them,  we  must  first  find  a  represen¬ 
tation  for  these  games.  The  naive  solution  is  to  enumerate 
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the  payoffs  to  each  set  of  agents,  therefore  requiring  space 
exponential  in  the  number  of  agents  in  the  game.  For  the 
two  applications  described,  the  number  of  agents  in  the  sys¬ 
tem  can  easily  exceed  a  hundred;  this  naive  approach  will 
not  be  scalable  to  such  problems.  Therefore,  it  is  critical  to 
find  good  representation  schemes  for  coalitional  games. 

We  believe  that  the  quality  of  a  representation  scheme 
should  be  evaluated  by  four  criteria. 

Expressivity:  the  breadth  of  the  class  of  coalitional  games 
covered  by  the  representation. 

Conciseness:  the  space  requirement  of  the  representation. 

Efficiency:  the  efficiency  of  the  algorithms  we  can  develop 
for  the  representation. 

Simplicity:  the  ease  of  use  of  the  representation  by  users 
of  the  system. 

The  ideal  representation  should  be  fully  expressive,  i.e. ,  it 
should  be  able  to  represent  any  coalitional  games,  use  as 
little  space  as  possible,  have  efficient  algorithms  for  com¬ 
putation,  and  be  easy  to  use.  The  goal  of  this  paper  is  to 
develop  a  representation  scheme  that  has  properties  close  to 
the  ideal  representation. 

Unfortunately,  given  that  the  number  of  degrees  of  free¬ 
dom  of  coalitional  games  is  0(2”),  not  all  games  can  be  rep¬ 
resented  concisely  using  a  single  scheme  due  to  information 
theoretic  constraints.  For  any  given  class  of  games,  one  may 
be  able  to  develop  a  representation  scheme  that  is  tailored 
and  more  compact  than  a  general  scheme.  For  example,  for 
the  recommendation  system  game,  a  highly  compact  repre¬ 
sentation  would  be  one  that  simply  states  which  agents  know 
of  which  products,  and  let  the  algorithms  that  operate  on 
the  representation  to  compute  the  values  of  coalitions  ap¬ 
propriately.  For  some  problems,  however,  there  may  not  be 
efficient  algorithms  for  customized  representations.  By  hav¬ 
ing  a  general  representation  and  efficient  algorithms  that  go 
with  it,  the  representation  will  be  useful  as  a  prototyping 
tool  for  studying  new  economic  situations. 

1.3  Previous  Work 

The  question  of  coalitional  game  representation  has  only 
been  sparsely  explored  in  the  past  [2,  3,  4].  In  [4],  Deng 
and  Papadimitriou  focused  on  the  complexity  of  different 
solution  concepts  on  coalitional  games  defined  on  graphs. 
While  the  representation  is  compact,  it  is  not  fully  expres¬ 
sive.  In  [2] ,  Conitzer  and  Sandholm  looked  into  the  problem 
of  determining  the  emptiness  of  the  core  in  superadditive 
games.  They  developed  a  compact  representation  scheme 
for  such  games,  but  again  the  representation  is  not  fully  ex¬ 
pressive  either.  In  [3],  Conitzer  and  Sandholm  developed  a 
fully  expressive  representation  scheme  based  on  decomposi¬ 
tion.  Our  work  extends  and  generalizes  the  representation 
schemes  in  [3,  4]  through  decomposing  the  game  into  a  set  of 
rules  that  assign  marginal  contributions  to  groups  of  agents. 
We  will  give  a  more  detailed  review  of  these  papers  in  section 
2.2  after  covering  the  technical  background. 

1.4  Summary  of  Our  Contributions 

•  We  develop  the  marginal  contribution  networks  rep¬ 
resentation,  a  fully  expressive  representation  scheme 
whose  size  scales  according  to  the  complexity  of  the 


interactions  among  the  agents.  We  believe  that  the 
representation  is  also  simple  and  intuitive. 

•  We  develop  an  algorithm  for  computing  the  Shapley 
value  of  coalitional  games  under  this  representation 
that  runs  in  time  linear  in  the  size  of  the  input. 

•  Under  the  graphical  interpretation  of  the  represen¬ 
tation,  we  develop  an  algorithm  for  determining  the 
whether  a  payoff  vector  is  in  the  core  and  the  emptiness 
of  the  core  in  time  exponential  only  in  the  treewidth 
of  the  graph. 

2.  PRELIMINARIES 

In  this  section,  we  will  briefly  review  the  basics  of  coali¬ 
tional  game  theory  and  its  two  primary  solution  concepts, 
the  Shapley  value  and  the  core.1  We  will  also  review  previ¬ 
ous  work  on  coalitional  game  representation  in  more  detail. 
Throughout  this  paper,  we  will  assume  that  the  payoff  to 
a  group  of  agents  can  be  freely  distributed  among  its  mem¬ 
bers.  This  assumption  is  often  known  as  the  transferable 
utility  assumption. 

2.1  Technical  Background 

We  can  represent  a  coalition  game  with  transferable  utility 
by  the  pair  (N,v),  where 

•  IV  is  the  set  of  agents;  and 

•  v  :  2n  h  I  is  a  function  that  maps  each  group  of 
agents  S  C  N  to  a  real-valued  payoff. 

This  representation  is  known  as  the  characteristic  form.  As 
there  are  exponentially  many  subsets,  it  will  take  space  ex¬ 
ponential  in  the  number  of  agents  to  describe  a  coalitional 
game. 

An  outcome  in  a  coalitional  game  specifies  the  utilities 
the  agents  receive.  A  solution  concept  assigns  to  each  coali¬ 
tional  game  a  set  of  “reasonable”  outcomes.  Different  so¬ 
lution  concepts  attempt  to  capture  in  some  way  outcomes 
that  are  stable  and/or  fair.  Two  of  the  best  known  solution 
concepts  are  the  Shapley  value  and  the  core. 

The  Shapley  value  is  a  normative  solution  concept.  It 
prescribes  a  “fair”  way  to  divide  the  gains  from  cooperation 
when  the  grand  coalition  (i.e.,  N)  is  formed.  The  division 
of  payoff  to  agent  i  is  the  average  marginal  contribution  of 
agent  i  over  all  possible  permutations  of  the  agents.  For¬ 
mally,  let  4>i{v)  denote  the  Shapley  value  of  i  under  charac¬ 
teristic  function  v,  then2 

Mv)  =  J2  s  ^n  J — —  (V(S  u  W)  ~  V(S ))  w 

SC  N 

The  Shapley  value  is  a  solution  concept  that  satisfies  many 
nice  properties,  and  has  been  studied  extensively  in  the  eco¬ 
nomic  and  game  theoretic  literature.  It  has  a  very  useful 
axiomatic  characterization. 

Efficiency  (EFF)  A  total  of  v(N)  is  distributed  to  the 
agents,  i.e.,  EieJV  <Mv)  =  v(N). 

1The  materials  and  terminology  are  based  on  the  textbooks 
by  Mas-Colell  et  al.  [9]  and  Osborne  and  Rubinstein  [11]. 
2As  a  notational  convenience,  we  will  use  the  lower-case  let¬ 
ter  to  represent  the  cardinality  of  a  set  denoted  by  the  cor¬ 
responding  upper-case  letter. 
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Symmetry  (SYM)  If  agents  i  and  j  are  interchangeable, 
then  =  <f>j(v ). 

Dummy  (DUM)  If  agent  i  is  a  dummy  player,  i.e.,  his 
marginal  contribution  to  all  groups  S  are  the  same, 
=  «({*}). 

Additivity  (ADD)  For  any  two  coalitional  games  v  and 
w  dehned  over  the  same  set  of  agents  N,  <j>i  (v  +  w)  = 
<t>i(v)  +  4>i(w)  for  all  i  £  N,  where  the  game  v  +  w  is 
defined  as  (v  +  w)(S)  =  v(S)  +  w(S)  for  all  SCAT. 

We  will  refer  to  these  axioms  later  in  our  proof  of  correctness 
of  the  algorithm  for  computing  the  Shapley  value  under  our 
representation  in  section  4. 

The  core  is  another  major  solution  concept  for  coalitional 
games.  It  is  a  descriptive  solution  concept  that  focuses  on 
outcomes  that  are  “stable.”  Stability  under  core  means  that 
no  set  of  players  can  jointly  deviate  to  improve  their  payoffs. 
Formally,  let  x(S)  denote  sXi.  An  outcome  x  £  E"  is 
in  the  core  if 

VS  C  N  x(S)  >  v(S)  (2) 

The  core  was  one  of  the  first  proposed  solution  concepts 
for  coalitional  games,  and  had  been  studied  in  detail.  An 
important  question  for  a  given  coalitional  game  is  whether 
the  core  is  empty.  In  other  words,  whether  there  is  any 
outcome  that  is  stable  relative  to  group  deviation.  For  a 
game  to  have  a  non-empty  core,  it  must  satisfy  the  property 
of  balancedness,  dehned  as  follows.  Let  Is  £  Rn  denote  the 
characteristic  vector  of  S  given  by 

1  if  i  £  S 
0  otherwise 

Let  (A s)scn  be  a  set  of  weights  such  that  each  As  is  in  the 
range  between  0  and  1.  This  set  of  weights,  (As)scjv,  is  a 
balanced  collection  if  for  all  i  £  N, 

y  ^s0-s)i  =  i 

SCN 

A  game  is  balanced  if  for  all  balanced  collections  of  weights, 

£  A^(s)  ^  (3) 

SCN 

By  the  Bondereva-Shapley  theorem,  the  core  of  a  coali¬ 
tional  game  is  non-empty  if  and  only  if  the  game  is  bal¬ 
anced.  Therefore,  we  can  use  linear  programming  to  deter¬ 
mine  whether  the  core  of  a  game  is  empty. 

maximize  J2scn  ^sv(S) 

ASM2" 

subject  to  ^scn  As  Is  =  1  Vi  £  N  (4) 

As  >  0  VS  C  N 

If  the  optimal  value  of  (4)  is  greater  than  the  value  of  the 
grand  coalition,  then  the  core  is  empty.  Unfortunately,  this 
program  has  an  exponential  number  of  variables  in  the  num¬ 
ber  of  players  in  the  game,  and  hence  an  algorithm  that  oper¬ 
ates  directly  on  this  program  would  be  infeasible  in  practice. 
In  section  5.4,  we  will  describe  an  algorithm  that  answers 
the  question  of  emptiness  of  core  that  works  on  the  dual  of 
this  program  instead. 


2.2  Previous  Work  Revisited 

Deng  and  Papadimitriou  looked  into  the  complexity  of 
various  solution  concepts  on  coalitional  games  played  on 
weighted  graphs  in  [4].  In  their  representation,  the  set  of 
agents  are  the  nodes  of  the  graph,  and  the  value  of  a  set  of 
agents  S  is  the  sum  of  the  weights  of  the  edges  spanned  by 
them.  Notice  that  this  representation  is  concise  since  the 
space  required  to  specify  such  a  game  is  0(n2).  However, 
this  representation  is  not  general;  it  will  not  be  able  to  repre¬ 
sent  interactions  among  three  or  more  agents.  For  example, 
it  will  not  be  able  to  represent  the  majority  game ,  where  a 
group  of  agents  S  will  have  value  of  1  if  and  only  if  s  >  n/2. 
On  the  other  hand,  there  is  an  efficient  algorithm  for  com¬ 
puting  the  Shapley  value  of  the  game,  and  for  determining 
whether  the  core  is  empty  under  the  restriction  of  positive 
edge  weights.  However,  in  the  unrestricted  case,  determin¬ 
ing  whether  the  core  is  non-empty  is  coNP-complete. 

Conitzer  and  Sandholm  in  [2]  considered  coalitional  games 
that  are  superadditive.  They  described  a  concise  represen¬ 
tation  scheme  that  only  states  the  value  of  a  coalition  if  the 
value  is  strictly  superadditive.  More  precisely,  the  semantics 
of  the  representation  is  that  for  a  group  of  agents  S, 

v(S)  =  max  V  v(Ti) 

{Ti,T!r..,T„}6n^ 

i 

where  n  is  the  set  of  all  possible  partitions  of  S.  The  value 
v(S)  is  only  explicitly  specified  for  S  if  v(S)  is  greater  than 
all  partitioning  of  S  other  than  the  trivial  partition  ({S'}). 
While  this  representation  can  represent  all  games  that  are 
superadditive,  there  are  coalitional  games  that  it  cannot  rep¬ 
resent.  For  example,  it  will  not  be  able  to  represent  any 
games  with  substitutability  among  the  agents.  An  exam¬ 
ple  of  a  game  that  cannot  be  represented  is  the  unit  game, 
where  v(S)  =  1  as  long  as  S  ^  0.  Under  this  representa¬ 
tion,  the  authors  showed  that  determining  whether  the  core 
is  non-empty  is  coNP-complete.  In  fact,  even  determining 
the  value  of  a  group  of  agents  is  NP-complete. 

In  a  more  recent  paper,  Conitzer  and  Sandholm  described 
a  representation  that  decomposes  a  coalitional  game  into  a 
number  of  subgames  whose  sum  add  up  to  the  original  game 
[3].  The  payoffs  in  these  subgames  are  then  represented  by 
their  respective  characteristic  functions.  This  scheme  is  fully 
general  as  the  characteristic  form  is  a  special  case  of  this 
representation.  For  any  given  game,  there  may  be  multiple 
ways  to  decompose  the  game,  and  the  decomposition  may 
influence  the  computational  complexity.  For  computing  the 
Shapley  value,  the  authors  showed  that  the  complexity  is 
linear  in  the  input  description;  in  particular,  if  the  largest 
subgame  (as  measured  by  number  of  agents)  is  of  size  n  and 
the  number  of  subgames  is  m,  then  their  algorithm  runs 
in  0(m2n)  time,  where  the  input  size  will  also  be  0(m2n). 
On  the  other  hand,  the  problem  of  determining  whether  a 
certain  outcome  is  in  the  core  is  coNP-complete. 

3.  MARGINAL  CONTRIBUTION  NETS 

In  this  section,  we  will  describe  the  Marginal  Contribution 
Networks  representation  scheme.  We  will  show  that  the  idea 
is  flexible,  and  we  can  easily  extend  it  to  increase  its  con¬ 
ciseness.  We  will  also  show  how  we  can  use  this  scheme  to 
represent  the  recommendation  game  from  the  introduction. 
Finally,  we  will  show  that  this  scheme  is  fully  expressive, 
and  generalizes  the  representation  schemes  in  [3,  4]. 
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3.1  Rules  and  Marginal  Contribution  Networks 

The  basic  idea  behind  marginal  contribution  networks 
(MC-nets)  is  to  represent  coalitional  games  using  sets  of 
rules.  The  rules  in  MC-nets  have  the  following  syntactic 
form: 

Pattern  — >  value 

A  rule  is  said  to  apply  to  a  group  of  agents  S  if  S  meets 
the  requirement  of  the  Pattern.  In  the  basic  scheme,  these 
patterns  are  conjunctions  of  agents,  and  S  meets  the  re¬ 
quirement  of  the  given  pattern  if  S'  is  a  superset  of  it.  The 
value  of  a  group  of  agents  is  defined  to  be  the  sum  over  the 
values  of  all  rules  that  apply  to  the  group.  For  example,  if 
the  set  of  rules  are 

{a  A  b}  — >  5 

{6}  — >  2 

then  v({a})  =  0,  u({5})  =  2,  and  v({a,  6})  =  5  +  2  =  7. 

MC-nets  is  a  very  flexible  representation  scheme,  and  can 
be  extended  in  different  ways.  One  simple  way  to  extend 
it  and  increase  its  conciseness  is  to  allow  a  wider  class  of 
patterns  in  the  rules.  A  pattern  that  we  will  use  throughout 
the  remainder  of  the  paper  is  one  that  applies  only  in  the 
absence  of  certain  agents.  This  is  useful  for  expressing  con¬ 
cepts  such  as  substitutability  or  default  values.  Formally, 
we  express  such  patterns  by 

{pi  A  P2  A  . . .  A  Pm  A  -mi  A  -1712  A  ...  A  ->nn} 

which  has  the  semantics  that  such  rule  will  apply  to  a  group 
S  only  if  {pi}iLi  £  S  and  {nj}"=1  ^  S.  We  will  call 
the  in  the  above  pattern  the  positive  literals,  and 

{rij}"=1  the  negative  literals.  Note  that  if  the  pattern  of 
a  rule  consists  solely  of  negative  literals,  we  will  consider 
that  the  empty  set  of  agents  will  also  satisfy  such  pattern, 
and  hence  v(0)  may  be  non-zero  in  the  presence  of  negative 
literals. 

To  demonstrate  the  increase  in  conciseness  of  representa¬ 
tion,  consider  the  unit  game  described  in  section  2.2.  To 
represent  such  a  game  without  using  negative  literals,  we 
will  need  271  rules  for  n  players:  we  need  a  rule  of  value  1 
for  each  individual  agent,  a  rule  of  value  —1  for  each  pair  of 
agents  to  counter  the  double-counting,  a  rule  of  value  1  for 
each  triplet  of  agents,  etc.,  similar  to  the  inclusion-exclusion 
principle.  On  the  other  hand,  using  negative  literals,  we 
only  need  n  rules:  value  1  for  the  first  agent,  value  1  for  the 
second  agent  in  the  absence  of  the  first  agent,  value  1  for  the 
third  agent  in  the  absence  of  the  first  two  agents,  etc.  The 
representational  savings  can  be  exponential  in  the  number 
of  agents. 

Given  a  game  represented  as  a  MC-net,  we  can  interpret 
the  set  of  rules  that  make  up  the  game  as  a  graph.  We  call 
this  graph  the  agent  graph.  The  nodes  in  the  graph  will  rep¬ 
resent  the  agents  in  the  game,  and  for  each  rule  in  the  MC- 
net,  we  connect  all  the  agents  in  the  rule  together  and  assign 
a  value  to  the  clique  formed  by  the  set  of  agents.  Notice  that 
to  accommodate  negative  literals,  we  will  need  to  annotate 
the  clique  appropriately.  This  alternative  view  of  MC-nets 
will  be  useful  in  our  algorithm  for  Core-Membership  in 
section  5. 

We  would  like  to  end  our  discussion  of  the  representation 
scheme  by  mentioning  a  trade-off  between  the  expressive¬ 
ness  of  patterns  and  the  space  required  to  represent  them. 


To  represent  a  coalitional  game  in  characteristic  form,  one 
would  need  to  specify  all  2”  —  1  values.  There  is  no  over¬ 
head  on  top  of  that  since  there  is  a  natural  ordering  of  the 
groups.  For  MC-nets,  however,  specification  of  the  rules 
requires  specifying  both  the  patterns  and  the  values.  The 
patterns,  if  not  represented  compactly,  may  end  up  over¬ 
whelming  the  savings  from  having  fewer  values  to  specify. 
The  space  required  for  the  patterns  also  leads  to  a  trade¬ 
off  between  the  expressiveness  of  the  allowed  patterns  and 
the  simplicity  of  representing  them.  However,  we  believe 
that  for  most  naturally  arising  games,  there  should  be  suffi¬ 
cient  structure  in  the  problem  such  that  our  representation 
achieves  a  net  saving  over  the  characteristic  form. 

3.2  Example:  Recommendation  Game 

As  an  example,  we  will  use  MC-net  to  represent  the  rec¬ 
ommendation  game  discussed  in  the  introduction.  For  each 
product,  as  the  benefit  of  knowing  about  the  product  will 
count  only  once  for  each  group,  we  need  to  capture  sub¬ 
stitutability  among  the  agents.  This  can  be  captured  by  a 
scaled  unit  game.  Suppose  the  value  of  the  knowledge  about 
product  i  is  Vi,  and  there  are  n;  agents,  denoted  by  {xj}, 
who  know  about  the  product,  the  game  for  product  i  can 
then  be  represented  as  the  following  rules: 

{xj  }  ->  Vi 
{Xi  A  -iXi }  — ►  Vi 

r  m  .  riA  —  1  .  .  l-» 

{£;  A  ->xi  A  •  •  •  A  ->Xi )  — >  Vi 

The  entire  game  can  then  be  built  up  from  the  sets  of  rules 
of  each  product.  The  space  requirement  will  be  0{mn*), 
where  m  is  the  number  of  products  in  the  system,  and  n* 
is  the  maximum  number  of  agents  who  knows  of  the  same 
product. 

3.3  Representation  Power 

We  will  discuss  the  expressiveness  and  conciseness  of  our 
representation  scheme  and  compare  it  with  the  previous 
works  in  this  subsection. 

Proposition  1.  Marginal  contribution  networks  consti¬ 
tute  a  fully  expressive  representation  scheme. 

Proof.  Consider  an  arbitrary  coalitional  game  ( N ,  v)  in 
characteristic  form  representation.  We  can  construct  a  set 
of  rules  to  describe  this  game  by  starting  from  the  singleton 
sets  and  building  up  the  set  of  rules.  For  any  singleton  set 
{i},  we  create  a  rule  {i}  — >  v(i).  For  any  pair  of  agents  {i,  j}, 
we  create  a  rule  {*  A  j}  — >  v({i,j})  —  u({i})  —  u({j}.  We 
can  continue  to  build  up  rules  in  a  manner  similar  to  the 
inclusion-exclusion  principle.  Since  the  game  is  arbitrary, 
MC-nets  are  fully  expressive.  □ 

Using  the  construction  outlined  in  the  proof,  we  can  show 
that  our  representation  scheme  can  simulate  the  multi-issue 
representation  scheme  of  [3]  in  almost  the  same  amount  of 
space. 

Proposition  2.  Marginal  contribution  networks  use  at 
most  a  linear  factor  (in  the  number  of  agents)  more  space 
than  multi-issue  representation  for  any  game. 
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Proof.  Given  a  game  in  multi-issue  representation,  we 
start  by  describing  each  of  the  subgames,  which  are  rep¬ 
resented  in  characteristic  form  in  [3],  with  a  set  of  rules. 
We  then  build  up  the  grand  game  by  including  all  the  rules 
from  the  subgames.  Note  that  our  representation  may  re¬ 
quire  a  space  larger  by  a  linear  factor  due  to  the  need  to 
describe  the  patterns  for  each  rule.  On  the  other  hand,  our 
approach  may  have  fewer  than  exponential  number  of  rules 
for  each  subgame,  depending  on  the  structure  of  these  sub¬ 
games,  and  therefore  may  be  more  concise  than  multi-issue 
representation.  □ 

On  the  other  hand,  there  are  games  that  require  exponen¬ 
tially  more  space  to  represent  under  the  multi-issue  scheme 
compared  to  our  scheme. 

Proposition  3.  Marginal  contribution  networks  are  ex¬ 
ponentially  more  concise  than  multi-issue  representation  for 
certain  games. 

Proof.  Consider  a  unit  game  over  all  the  agents  N.  As 
explained  in  3.1,  this  game  can  be  represented  in  linear  space 
using  MC-nets  with  negative  literals.  However,  as  there  is 
no  decomposition  of  this  game  into  smaller  subgames,  it  will 
require  space  0( 2n)  to  represent  this  game  under  the  multi¬ 
issue  representation.  □ 

Under  the  agent  graph  interpretation  of  MC-nets,  we  can 
see  that  MC-nets  is  a  generalization  of  the  graphical  repre¬ 
sentation  in  [4],  namely  from  weighted  graphs  to  weighted 
hypergraphs. 

Proposition  4.  Marginal  contribution  networks  can  rep¬ 
resent  any  games  in  graphical  form  (under  [f])  in  the  same 
amount  of  space. 

Proof.  Given  a  game  in  graphical  form,  G,  for  each  edge 
(i,  j)  with  weight  Wij  in  the  graph,  we  create  a  rule  {i,  j}  — > 
Wij.  Clearly  this  takes  exactly  the  same  space  as  the  size  of 
G,  and  by  the  additive  semantics  of  the  rules,  it  represents 
the  same  game  as  G.  □ 

4.  COMPUTING  THE  SHAPLEY  VALUE 

Given  a  MC-net,  we  have  a  simple  algorithm  to  compute 
the  Shapley  value  of  the  game.  Considering  each  rule  as  a 
separate  game,  we  start  by  computing  the  Shapley  value  of 
the  agents  for  each  rule.  For  each  agent,  we  then  sum  up 
the  Shapley  values  of  that  agent  over  all  the  rules.  We  first 
show  that  this  final  summing  process  correctly  computes  the 
Shapley  value  of  the  agents. 


For  rules  that  have  only  positive  literals,  the  Shapley  value 
of  the  agents  is  v/m,  where  v  is  the  value  of  the  rule  and 
m  is  the  number  of  agents  in  the  rule.  This  is  a  direct 
consequence  of  the  (SYM)  axiom  of  the  Shapley  value,  as 
the  agents  in  a  rule  are  indistinguishable  from  each  other. 

For  rules  that  have  both  positive  and  negative  literals,  we 
can  consider  the  positive  and  the  negative  literals  separately. 
For  a  given  positive  literal  i,  the  rule  will  apply  only  if  i 
occurs  in  a  given  permutation  after  the  rest  of  the  positive 
literals  but  before  any  of  the  negative  literals.  Formally,  let 
(pi  denote  the  Shapley  value  of  i,  p  denote  the  cardinality  of 
the  positive  set,  and  n  denote  the  cardinality  of  the  negative 
set,  then 

^  _  (p  —  l)!n!  _  v 

"  (P  +  n)\  V~^T) 

For  a  given  negative  literal  j,  j  will  be  responsible  for  can¬ 
celling  the  application  of  the  rule  if  all  positive  literals  come 
before  the  negative  literals  in  the  ordering,  and  j  is  the  first 
among  the  negative  literals.  Therefore, 


p\(n  —  1)! 
(p  +  n)\ 


i~v) 


By  the  (SYM)  axiom,  all  positive  literals  will  have  the  value 
of  (pi  and  all  negative  literals  will  have  the  value  of  <pj . 

Note  that  the  sum  over  all  agents  in  rules  with  mixed 
literals  is  0.  This  is  to  be  expected  as  these  rules  contribute 
0  to  the  grand  coalition.  The  fact  that  these  rules  have  no 
effect  on  the  grand  coalition  may  appear  odd  at  first.  But 
this  is  because  the  presence  of  such  rules  is  to  define  the 
values  of  coalitions  smaller  than  the  grand  coalition. 

In  terms  of  computational  complexity,  given  that  the  Shap¬ 
ley  value  of  any  agent  in  a  given  rule  can  be  computed  in 
time  linear  in  the  pattern  of  the  rule,  the  total  running  time 
of  the  algorithm  for  computing  the  Shapley  value  of  the 
game  is  linear  in  the  size  of  the  input. 


5.  ANSWERING  CORE-RELATED 
QUESTIONS 

There  are  a  few  different  but  related  computational  prob¬ 
lems  associated  with  the  solution  concept  of  the  core.  We 
will  focus  on  the  following  two  problems: 

Definition  1.  (Core-Membership)  Given  a  coalitional  game 
and  a  payoff  vector  x,  determine  if  x  is  in  the  core. 


Proposition  5.  The  Shapley  value  of  an  agent  in  a  marginal 
contribution  network  is  equal  to  the  sum  of  the  Shapley  val¬ 
ues  of  that  agent  over  each  rule. 

Proof.  For  any  group  S,  under  the  MC-nets  representa¬ 
tion,  v(S)  is  defined  to  be  the  sum  over  the  values  of  all  the 
rules  that  apply  to  S.  Therefore,  considering  each  rule  as  a 
game,  by  the  (ADD)  axiom  discussed  in  section  2,  the  Shap¬ 
ley  value  of  the  game  created  from  aggregating  all  the  rules 
is  equal  to  the  sum  of  the  Shapley  values  over  the  rules.  □ 

The  remaining  question  is  how  to  compute  the  Shapley 
values  of  the  rules.  We  can  separate  the  analysis  into  two 
cases,  one  for  rules  with  only  positive  literals  and  one  for 
rules  with  mixed  literals. 


Definition  2.  (Core-Non-Emptiness)  Given  a  coalitional 
game,  determine  if  the  core  is  non-empty. 

In  the  rest  of  the  section,  we  will  first  show  that  these 
two  problems  are  coNP-complete  and  coNP-hard  respec¬ 
tively,  and  discuss  some  complexity  considerations  about 
these  problems.  We  will  then  review  the  main  ideas  of  tree 
decomposition  as  it  will  be  used  extensively  in  our  algorithm 
for  Core-Membership.  Next,  we  will  present  the  algorithm 
for  Core-Membership,  and  show  that  the  algorithm  runs 
in  polynomial  time  for  graphs  of  bounded  treewidth.  We  end 
by  extending  this  algorithm  to  answer  the  question  of  CORE- 
Non-Emptiness  in  polynomial  time  for  graphs  of  bounded 
treewidth. 
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5.1  Computational  Complexity 

The  hardness  of  Core-Membership  and  Core-Non- 
Emptiness  follows  directly  from  the  hardness  results  of  games 
over  weighted  graphs  in  [4]. 

Proposition  6.  Core-Membership  for  games  represented 
as  marginal  contribution  networks  is  coNP-complete. 

Proof.  Core-Membership  in  MC-nets  is  in  the  class 
of  coNP  since  any  set  of  agents  S  of  which  v(S)  >  x(S) 
will  serve  as  a  certificate  to  show  that  x  does  not  belong  to 
the  core.  As  for  its  hardness,  given  any  instance  of  Core- 
Membership  for  a  game  in  graphical  form  of  [4],  we  can 
encode  the  game  in  exactly  the  same  space  using  MC-net 
due  to  Proposition  4.  Since  Core-Membership  for  games 
in  graphical  form  is  coNP-complete,  Core-Membership  in 
MC-nets  is  coNP-hard.  □ 

Proposition  7.  Core-Non-Emptiness  for  games  rep¬ 
resented  as  marginal  contribution  networks  is  coNP-hard. 

Proof.  The  same  argument  for  hardness  between  games 
in  graphical  frm  and  MC-nets  holds  for  the  problem  of  Core- 
Non-Emptiness.  □ 

We  do  not  know  of  a  certificate  to  show  that  Core-Non- 
Emptiness  is  in  the  class  of  coNP  as  of  now.  Note  that 
the  “obvious”  certificate  of  a  balanced  set  of  weights  based 
on  the  Bondereva-Shapley  theorem  is  exponential  in  size.  In 
[4] ,  Deng  and  Papadimitriou  showed  the  coNP-c.ompleteness 
of  Core-Non-Emptiness  via  a  combinatorial  characteri¬ 
zation,  namely  that  the  core  is  non-empty  if  and  only  if 
there  is  no  negative  cut  in  the  graph.  In  MC-nets,  however, 
there  need  not  be  a  negative  hypercut  in  the  graph  for  the 
core  to  be  empty,  as  demonstrated  by  the  following  game 
(IV  =  {1,2,  3, 4}): 

[l  if  5  =  {1,2, 3, 4} 

u(S)  =  ^3/4  if  S  =  {1,2},  {1,3},  {1,4},  or  {2, 3, 4}  (5) 

I  0  otherwise 

Applying  the  Bondereva-Shapley  theorem,  if  we  let  A12  = 
A13  =  A14  =  1/3,  and  A234  =  2/3,  this  set  of  weights  demon¬ 
strates  that  the  game  is  not  balanced,  and  hence  the  core 
is  empty.  On  the  other  hand,  this  game  can  be  represented 
with  MC-nets  as  follows  (weights  on  hyperedges): 

w({l,  2})  =  «7({1,3})  =  w({l,  4})  =  3/4 
w({l,  2,  3})  =  w({l,  2, 4})  =  W({1,  3,  4})  =  -6/4 
w({2,  3, 4})  =  3/4 
«>({l,2,3,4})  =  10/4 

No  matter  how  the  set  is  partitioned,  the  sum  over  the 
weights  of  the  hyperedges  in  the  cut  is  always  non-negative. 

To  overcome  the  computational  hardness  of  these  prob¬ 
lems,  we  have  developed  algorithms  that  are  based  on  tree 
decomposition  techniques.  For  Core- Membership,  our  al¬ 
gorithm  runs  in  time  exponential  only  in  the  treewidth  of  the 
agent  graph.  Thus,  for  graphs  of  small  treewidth,  such  as 
trees,  we  have  a  tractable  solution  to  determine  if  a  payoff 
vector  is  in  the  core.  By  using  this  procedure  as  a  sepa¬ 
ration  oracle,  i.e.,  a  procedure  for  returning  the  inequality 
violated  by  a  candidate  solution,  to  solving  a  linear  pro¬ 
gram  that  is  related  to  Core-Non-Emptiness  using  the  el¬ 
lipsoid  method,  we  can  obtain  a  polynomial  time  algorithm 
for  Core-Non-Emptiness  for  graphs  of  bounded  treewidth. 


5.2  Review  of  Tree  Decomposition 

As  our  algorithm  for  Core-Membership  relies  heavily 
on  tree  decomposition,  we  will  first  briefly  review  the  main 
ideas  in  tree  decomposition  and  treewidth.'5 

Definition  3.  A  tree  decomposition  of  a  graph  G  =  (V,  E) 
is  a  pair  ( X ,  T),  where  T  =  (/,  F)  is  a  tree  and  X  =  {Xj  |  i  £ 
1}  is  a  family  of  subsets  of  V,  one  for  each  node  of  T,  such 
that 

•  U  ieIXi  =  v-, 

•  For  all  edges  (v,  w)  £  E,  there  exists  an  i  £  I  with 
v  £  X,  and  w  £  Xj;  and 

•  ( Running  Intersection  Property)  For  all  i,j,k  £  I:  if  j 
is  on  the  path  from  i  to  k  in  T,  then  X,  n  X*,  C  Xj . 

The  treewidth  of  a  tree  decomposition  is  defined  as  the  max¬ 
imum  cardinality  over  all  sets  in  X ,  less  one.  The  treewidth 
of  a  graph  is  defined  as  the  minimum  treewidth  over  all  tree 
decompositions  of  the  graph. 

Given  a  tree  decomposition,  we  can  convert  it  into  a  nice 
tree  decomposition  of  the  same  treewidth,  and  of  size  linear 
in  that  of  T. 

Definition  4-  A  tree  decomposition  T  is  nice  if  T  is  rooted 
and  has  four  types  of  nodes: 

Leaf  nodes  i  are  leaves  of  T  with  |Xj|  =  1. 

Introduce  nodes  i  have  one  child  j  such  that  Xj  =  Xj  U 
{«}  of  some  v  £  V. 

Forget  nodes  i  have  one  child  j  such  that  Xj  =  Xj  \  {w} 
for  some  v  £  Xj. 

Join  nodes  i  have  two  children  j  and  k  with  Xj  =  Xj  = 
Xfc. 

An  example  of  a  (partial)  nice  tree  decomposition  together 
with  a  classification  of  the  different  types  of  nodes  is  in  Fig¬ 
ure  1.  In  the  following  section,  we  will  refer  to  nodes  in  the 
tree  decomposition  as  nodes,  and  nodes  in  the  agent  graph 
as  agents. 

5.3  Algorithm  for  Core  Membership 

Our  algorithm  for  Core-Membership  takes  as  an  input 
a  nice  tree  decomposition  T  of  the  agent  graph  and  a  payoff 
vector  x.  By  definition,  if  x  belongs  to  the  core,  then  for 
all  groups  SCX,  x(S)  >  v(S).  Therefore,  the  difference 
x(S)  —  v(S)  measures  how  “close”  the  group  S  is  to  violating 
the  core  condition.  We  call  this  difference  the  excess  of  group 
S. 

Definition  5.  The  excess  of  a  coalition  S,  e(S),  is  defined 
as  x(S)  —  v(S). 

A  brute-force  approach  to  determine  if  a  payoff  vector  be¬ 
longs  to  the  core  will  have  to  check  that  the  excesses  of  all 
groups  are  non-negative.  However,  this  approach  ignores  the 
structure  in  the  agent  graph  that  will  allow  an  algorithm  to 
infer  that  certain  groups  have  non-negative  excesses  due  to 

3This  is  based  largely  on  the  materials  from  a  survey  paper 
by  Bodlaender  [1]. 
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Figure  1:  Example  of  a  (partial)  nice  tree  decompo¬ 
sition 


the  excesses  computed  elsewhere  in  the  graph.  Tree  decom¬ 
position  is  the  key  to  take  advantage  of  such  inferences  in  a 
structured  way. 

For  now,  let  us  focus  on  rules  with  positive  literals.  Sup¬ 
pose  we  have  already  checked  that  the  excesses  of  all  sets 
R  C  U  are  non-negative,  and  we  would  like  to  check  if  the 
addition  of  an  agent  i  to  the  set  U  will  create  a  group  with 
negative  excess.  A  naive  solution  will  be  to  compute  the 
excesses  of  all  sets  that  include  i.  The  excess  of  the  group 
(RU  {*})  for  any  group  R  can  be  computed  as  follows 

e(R  U  {i})  =  e(R)  +  Xi  —  u(c)  (6) 

where  c  is  the  cut  between  R  and  i,  and  v(c)  is  the  sum  of 
the  weights  of  the  edges  in  the  cut. 

However,  suppose  that  from  the  tree  decomposition,  we 
know  that  i  is  only  connected  to  a  subset  of  U,  say  S,  which 
we  will  call  the  entry  set  to  U.  Ideally,  because  i  does  not 
share  any  edges  with  members  of  U  =  (U  \  S),  we  would 
hope  that  an  algorithm  can  take  advantage  of  this  structure 
by  checking  only  sets  that  are  subsets  of  ( S  U  {*})•  This 
computational  saving  may  be  possible  since  (xi—v(c))  in  the 
update  equation  of  (6)  does  not  depend  on  U.  However,  we 
cannot  simply  ignore  U  as  members  of  U  may  still  influence 
the  excesses  of  groups  that  include  agent  i  through  group 
S.  Specifically,  if  there  exists  a  group  T  D  S  such  that 
e(T)  <  e(S),  then  even  when  e (S  U  {i})  has  non- negative 
excess,  e(TU{i))  may  have  negative  excess.  In  other  words, 
the  excess  available  at  S  may  have  been  “drained”  away  due 
to  T.  This  motivates  the  definition  of  the  reserve  of  a  group. 

Definition  6.  The  reserve  of  a  coalition  S  relative  to  a 


coalition  U  is  the  minimum  excess  over  all  coalitions  between 
S  and  U,  i.e.,  all  T  :  S  C  T  C  U.  We  denote  this  value  by 
r(S,  U).  We  will  refer  to  the  group  T  that  has  the  minimum 
excess  as  arg  r(S,  U).  We  will  also  call  U  the  limiting  set  of 
the  reserve  and  S  the  base  set  of  the  reserve. 

Our  algorithm  works  by  keeping  track  of  the  reserves  of 
all  non-empty  subsets  that  can  be  formed  by  the  agents  of  a 
node  at  each  of  the  nodes  of  the  tree  decomposition.  Starting 
from  the  leaves  of  the  tree  and  working  towards  the  root, 
at  each  node  i,  our  algorithm  computes  the  reserves  of  all 
groups  S  C  Xi,  limited  by  the  set  of  agents  in  the  subtree 
rooted  at  i,  Ti,  except  those  in  ( Xi\S ).  The  agents  in  ( Xi\S ) 
are  excluded  to  ensure  that  S  is  an  entry  set.  Specifically, 
S  is  the  entry  set  to  ((Ti  \  Xi)  U  S). 

To  accomodate  for  negative  literals,  we  will  need  to  make 
two  adjustments.  Firstly,  the  cut  between  an  agent  m  and  a 
set  S  at  node  i  now  refers  to  the  cut  among  agent  m,  set  S, 
and  set  -i(X;  \  S),  and  its  value  must  be  computed  accord¬ 
ingly.  Also,  when  an  agent  m  is  introduced  to  a  group  at  an 
introduce  node,  we  will  also  need  to  consider  the  change  in 
the  reserves  of  groups  that  do  not  include  m  due  to  possible 
cut  involving  -i m  and  the  group. 

As  an  example  of  the  reserve  values  we  keep  track  of  at  a 
tree  node,  consider  node  i  of  the  tree  in  Figure  1.  At  node 
i,  we  will  keep  track  of  the  following: 

r({l},{l,2,...}) 
r({3},{2,3,...}) 
r({4},{2,4,...}) 
r((l,  3},  {1,  2, 3, . . .}) 
r((l,  4},  {1,  2,4, . . .}) 
r({3,4},{2,3,4,...}) 
r({l,  3,4},  {1, 2, 3, 4, . . .} 

where  the  dots  . . .  refer  to  the  agents  rooted  under  node  m. 

For  notational  use,  we  will  use  ri(S)  to  denote  r(S,  U)  at 
node  i  where  U  is  the  set  of  agents  in  the  subtree  rooted  at 
node  i  excluding  agents  in  (Xi  \  S).  We  sometimes  refer  to 
these  values  as  the  r-values  of  a  node.  The  details  of  the 
r- value  computations  are  in  Algorithm  1. 

To  determine  if  the  payoff  vector  x  is  in  the  core,  during 
the  r-value  computation  at  each  node,  we  can  check  if  all  of 
the  r-values  are  non-negative.  If  this  is  so  for  all  nodes  in 
the  tree,  the  payoff  vector  x  is  in  the  core.  The  correctness 
of  the  algorithm  is  due  to  the  following  proposition. 

Proposition  8.  The  payoff  vector  x  is  not  in  the  core  if 
and  only  if  the  r-values  at  some  node  i  for  some  group  S  is 
negative. 

Proof.  (4=)  If  the  reserve  at  some  node  i  for  some  group 
S  is  negative,  then  there  exists  a  coalition  T  for  which 
e(T)  =  x(T)  —  v(T )  <  0,  hence  x  is  not  in  the  core. 

(=4-)  Suppose  x  is  not  in  the  core,  then  there  exists  some 
group  R*  such  that  e(R*)  <  0.  Let  Xroot  be  the  set  of  nodes 
at  the  root.  Consider  any  set  S  €  Xroot,  rr0ot(S)  will  have 
the  base  set  of  S  and  the  limiting  set  of  (( N  \  Xroot)  U  S). 
The  union  over  all  of  these  ranges  includes  all  sets  U  for 
which  U  l~l  Xroot  0-  Therefore,  if  R*  is  not  disjoint  from 
Xroot,  the  r-value  for  some  group  in  the  root  is  negative. 

If  R*  is  disjoint  from  U,  consider  the  forest  {Ti}  resulting 
from  removal  of  all  tree  nodes  that  include  agents  in  Xr0ot- 
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Algorithm  1  Subprocedures  for  Core  Membership 
Leaf-Node(j) 

1:  n(Xi)  «-  e(Xi) 

Introduce-Node  (i) 

2:  j  <—  child  of  i 

3:  m  <—  Xi  \  Xj  {the  introduced  node} 

4:  for  all  S'  C  A,,S  /  0  do 

5:  C  <—  all  hyperedges  in  the  cut  of  m,  S,  and  -> ( X,  \  S) 

6:  ri(S  U  {*})  <—  Tj(S)  +  Xm  —  v(C) 

7:  C  <—  all  hyperedges  in  the  cut  of  -im,  S,  and  -> (Xi\S) 

8:  ri(S)^rj(S)-v(C) 

9:  end  for 

10:  r({m})  <—  e({m}) 

FORGET-NODE(i) 

11:  j  <—  child  of  * 

12:  m  <—  Aj  \  A;  {the  forgotten  node} 

13:  for  all  S  C  A, ,  S  ^  0  do 

14:  n(S)  =  min(rj(S),rj(S  D  {m})) 

15:  end  for 

JOIN-NODE(z) 

16:  {j,  k}  <—  {left,  right}  child  of  i 
17:  for  all  S  C  Xt,S  /  0  do 
18:  rj(yS)  <—  rj(S')  +  rfc(S)  —  e(S') 

19:  end  for 


By  the  running  intersection  property,  the  sets  of  nodes  in 
the  trees  TVs  are  disjoint.  Thus,  if  the  set  R*  =  (J*  Si  for 
some  Si  €  T),  e(R*)  =  J2ie(Si)  <  0  implies  some  group 
S*  has  negative  excess  as  well.  Therefore,  we  only  need  to 
check  the  r-values  of  the  nodes  on  the  individual  trees  in  the 
forest. 

But  for  each  tree  in  the  forest,  we  can  apply  the  same 
argument  restricted  to  the  agents  in  the  tree.  In  the  base 
case,  we  have  the  leaf  nodes  of  the  original  tree  decomposi¬ 
tion,  say,  for  agent  i.  If  R*  =  {i},  then  r({»})  =  e({i})  <  0. 
Therefore,  by  induction,  if  e(R*)  <  0,  some  reserve  at  some 
node  would  be  negative.  □ 

We  will  next  explain  the  intuition  behind  the  correctness 
of  the  computations  for  the  r-values  in  the  tree  nodes.  A 
detailed  proof  of  correctness  of  these  computations  can  be 
found  in  the  appendix  under  Lemmas  1  and  2. 

Proposition  9.  The  procedure  in  Algorithm  1  correctly 
compute  the  r-values  at  each  of  the  tree  nodes. 

Proof.  (Sketch)  We  can  perform  a  case  analysis  over 
the  four  types  of  tree  nodes  in  a  nice  tree  decomposition. 

Leaf  nodes  (i)  The  only  reserve  value  to  be  computed  is 
n( Xi),  which  equals  r(A';,  A*;),  and  therefore  it  is  just 
the  excess  of  group  A,. 

Forget  nodes  ( i  with  child  j)  Let  m  be  the  forgotten  node. 
For  any  subset  S  C  A;,  arg ri(S)  must  be  chosen  be¬ 
tween  the  groups  of  S  and  S  U  {m},  and  hence  we 
choose  between  the  lower  of  the  two  from  the  r-values 
at  node  j. 

Introduce  nodes  (i  with  child  j )  Let  m  be  the  introduced 
node.  For  any  subset  T  C  A i  that  includes  m,  let  S 
denote  ( T  \  {m}).  By  the  running  intersection  prop¬ 
erty,  there  are  no  rules  that  involve  m  and  agents  of 


the  subtree  rooted  at  node  i  except  those  involving 
m  and  agents  in  A As  both  the  base  set  and  the 
limiting  set  of  the  r-values  of  node  j  and  node  i  dif¬ 
fer  by  {m},  for  any  group  V  that  lies  between  the 
base  set  and  the  limiting  set  of  node  i,  the  excess  of 
group  V  will  differ  by  a  constant  amount  from  the 
corresponding  group  (V  \  {m})  at  node  j.  Therefore, 
the  set  arg ri(T)  equals  the  set  arg  rj (S)  U  {m},  and 
ri(T )  =  rj{S)  +xm  —  v(cut),  where  u(cut)  is  the  value 
of  the  rules  in  the  cut  between  m  and  S.  For  any  sub¬ 
set  S  C  Xi  that  does  not  include  m,  we  need  to  con¬ 
sider  the  values  of  rules  that  include  -> ra  as  a  literal 
in  the  pattern.  Also,  when  computing  the  reserve,  the 
payoff  Xm  will  not  contribute  to  group  S.  Therefore, 
together  with  the  running  intersection  property  as  ar¬ 
gued  above,  we  can  show  that  ri(S)  =  rj(S)  —  u(cut). 

Join  nodes  ( i  with  left  child  j  and  right  child  k)  For  any 
given  set  S  C  A,,  consider  the  r-values  of  that  set 
at  j  and  k.  If  arg  rj  (S)  or  arg  r*,  (S')  includes  agents 
not  in  S,  then  argrj(S)  and  argrk(S)  will  be  dis¬ 
joint  from  each  other  due  to  the  running  intersection 
property.  Therefore,  we  can  decompose  arg  n  (S)  into 
three  sets,  (arg  rj  (S)  \  S)  on  the  left,  S  in  the  middle, 
and  (argrfc(S')  \  S)  on  the  right.  The  reserve  Vj(S) 
will  cover  the  excesses  on  the  left  and  in  the  middle, 
whereas  the  reserve  rt(S)  will  cover  those  on  the  right 
and  in  the  middle,  and  so  the  excesses  in  the  middle  is 
double-counted.  We  adjust  for  the  double-counting  by 
subtracting  the  excesses  in  the  middle  from  the  sum 
of  the  two  reserves  rj(S)  and  rt(S). 

□ 

Finally,  note  that  each  step  in  the  computation  of  the  re¬ 
values  of  each  node  i  takes  time  at  most  exponential  in  the 
size  of  Xi,  hence  the  algorithm  runs  in  time  exponential  only 
in  the  treewidth  of  the  graph. 

5.4  Algorithm  for  Core  Non-emptiness 

We  can  extend  the  algorithm  for  Core-Membership  into 
an  algorithm  for  Core-Non-Emptiness.  As  described  in 
section  2,  whether  the  core  is  empty  can  be  checked  using 
the  optimization  program  based  on  the  balancedness  condi¬ 
tion  (3).  Unfortunately,  that  program  has  an  exponential 
number  of  variables.  On  the  other  hand,  the  dual  of  the 
program  has  only  n  variables,  and  can  be  written  as  follows: 

minimize  W",  Xi 

^  (7) 

subject  to  x(S)  >  v(S),  VS  C  N 

By  strong  duality,  optimal  value  of  (7)  is  equal  to  opti¬ 
mal  value  of  (4),  the  primal  program  described  in  section 
2.  Therefore,  by  the  Bondereva-Shapley  theorem,  if  the  op¬ 
timal  value  of  (7)  is  greater  than  v(N),  the  core  is  empty. 

We  can  solve  the  dual  program  using  the  ellipsoid  method 
with  Core-Membership  as  a  separation  oracle,  i.e.,  a  pro¬ 
cedure  for  returning  a  constraint  that  is  violated.  Note  that 
a  simple  extension  to  the  Core-Membership  algorithm  will 
allow  us  to  keep  track  of  the  set  T  for  which  e(T)  <  0  dur¬ 
ing  the  r-values  computation,  and  hence  we  can  return  the 
inequality  about  T  as  the  constraint  violated.  Therefore, 
Core-Non-Emptiness  can  run  in  time  polynomial  in  the 
running  time  of  Core-Membership,  which  in  turn  runs  in 
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time  exponential  only  in  the  treewidth  of  the  graph.  Note 
that  when  the  core  is  not  empty,  this  program  will  return 
an  outcome  in  the  core. 

6.  CONCLUDING  REMARKS 

We  have  developed  a  fully  expressive  representation  scheme 
for  coalitional  games  of  which  the  size  depends  on  the  com¬ 
plexity  of  the  interactions  among  the  agents.  Our  focus 
on  general  representation  is  in  contrast  to  the  approach 
taken  in  [3,  4].  We  have  also  developed  an  efficient  algo¬ 
rithm  for  the  computation  of  the  Shapley  values  for  this 
representation.  While  Core-Membership  for  MC-nets  is 
coNP-complete,  we  have  developed  an  algorithm  for  Core- 
Membership  that  runs  in  time  exponential  only  in  the  treewidth 
of  the  agent  graph.  We  have  also  extended  the  algorithm 
to  solve  Core-Non-Emptiness.  Other  than  the  algorithm 
for  Core-Non-Emptiness  in  [4]  under  the  restriction  of 
non-negative  edge  weights,  and  that  in  [2]  for  superaddi¬ 
tive  games  when  the  value  of  the  grand  coalition  is  given, 
we  are  not  aware  of  any  explicit  description  of  algorithms 
for  core-related  problems  in  the  literature. 

The  work  in  this  paper  is  related  to  a  number  of  areas 
in  computer  science,  especially  in  artificial  intelligence.  For 
example,  the  graphical  interpretation  of  MC-nets  is  closely 
related  to  Markov  random  fields  (MRFs)  of  the  Bayes  nets 
community.  They  both  address  the  issue  of  of  conciseness 
of  representation  by  using  the  combinatorial  structure  of 
weighted  hypergraphs.  In  fact,  Kearns  et  al.  first  apply 
these  idea  to  games  theory  by  introducing  a  representation 
scheme  derived  from  Bayes  net  to  represent  non-cooperative 
games  [6].  The  representational  issues  faced  in  coalitional 
games  are  closely  related  to  the  problem  of  expressing  val¬ 
uations  in  combinatorial  auctions  [5,  10].  The  OR-bid  lan¬ 
guage,  for  example,  is  strongly  related  to  superadditivity. 

The  question  of  the  representation  power  of  different  pat¬ 
terns  is  also  related  to  Boolean  expression  complexity  [12]. 

We  believe  that  with  a  better  understanding  of  the  relation¬ 
ships  among  these  related  areas,  we  may  be  able  to  develop 
more  efficient  representations  and  algorithms  for  coalitional 
games. 

Finally,  we  would  like  to  end  with  some  ideas  for  extend¬ 
ing  the  work  in  this  paper.  One  direction  to  increase  the 
conciseness  of  MC-nets  is  to  allow  the  definition  of  equiva¬ 
lent  classes  of  agents,  similar  to  the  idea  of  extending  Bayes 
nets  to  probabilistic  relational  models.  The  concept  of  sym¬ 
metry  is  prevalent  in  games,  and  the  use  of  classes  of  agents 
will  allow  us  to  capture  symmetry  naturally  and  concisely. 
This  will  also  address  the  problem  of  unpleasing  assymetric 
representations  of  symmetric  games  in  our  representation. 

Along  the  line  of  exploiting  symmetry,  as  the  agents  within 
the  same  class  are  symmetric  with  respect  to  each  other,  we 
can  extend  the  idea  above  by  allowing  functional  description 
of  marginal  contributions.  More  concretely,  we  can  specify 
the  value  of  a  rule  as  dependent  on  the  number  of  agents 
of  each  relevant  class.  The  use  of  functions  will  allow  con¬ 
cise  description  of  marginal  diminishing  returns  (MDRs). 
Without  the  use  of  functions,  the  space  needed  to  describe 
MDRs  among  n  agents  in  MC-nets  is  0(n).  With  the  use 
of  functions,  the  space  required  can  be  reduced  to  0(1). 

Another  idea  to  extend  MC-nets  is  to  augment  the  seman¬ 
tics  to  allow  constructs  that  specify  certain  rules  cannot  be 
applied  simultaneously.  This  is  useful  in  situations  where  a 
certain  agent  represents  a  type  of  exhaustible  resource,  and 


therefore  rules  that  depend  on  the  presence  of  the  agent 
should  not  apply  simultaneously.  For  example,  if  agent  i  in 
the  system  stands  for  coal,  we  can  either  use  it  as  fuel  for 
a  power  plant  or  as  input  to  a  steel  mill  for  making  steel, 
but  not  for  both  at  the  same  time.  Currently,  to  represent 
such  situations,  we  have  to  specify  rules  to  cancel  out  the 
effects  of  applications  of  different  rules.  The  augmented  se¬ 
mantics  can  simplify  the  representation  by  specifying  when 
rules  cannot  be  applied  together. 
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APPENDIX 

We  will  formally  show  the  correctness  of  the  r- value  compu¬ 
tation  in  Algorithm  1  of  introduce  nodes  and  join  nodes. 


Lemma  1.  The  procedure  for  computing  the  r -values  of 
introduce  nodes  in  Algorithm  1  is  correct. 


Proof.  Let  node  m  be  the  newly  introduced  agent  at  i. 
Let  U  denote  the  set  of  agents  in  the  subtree  rooted  at  i. 
By  the  running  intersection  property,  all  interactions  (the 
hyperedges)  between  m  and  U  must  be  in  node  i.  For  all 
S  C  Xi  :  m  £  S,  let  R  denote  (U  \  Xf)  U  S),  and  Q  denote 
(R  \  {m}). 


n(S)  =  r(S,  R) 

=  min  e(T) 

T-.SC.TCR 


=  min 

T-.SCTCR 


x(T) 


v(T) 


=  „  min ■  x(T  \  {m})  +  xm  -  v(T  \  {m})  -  v(cut) 

1  :o  _Z  tt 


=  (  min  e(T')  )  +  xm  —  w(cut) 

\T'-.S\{m}CT'CQ  ) 

=  rj(S)  +Xm—  v(cut) 


The  argument  for  sets  .S’  C  X,  :  rn  S  is  symmetric  except 
xm  will  not  contribute  to  the  reserve  due  to  the  absence  of 
m.  □ 


Lemma  2.  The  procedure  for  computing  the  r-values  of 
join  nodes  in  Algorithm  1  is  correct. 


Proof.  Consider  any  set  S  C  A';.  Let  Uj  denote  the 
subtree  rooted  at  the  left  child,  Rj  denote  ((Uj  \  Xj)  U  S), 
and  Qj  denote  (Uj  \  Xj).  Let  C4,  Rk,  and  Qk  be  defined 
analogously  for  the  right  child.  Let  R  denote  (U  \  Xi)  U  S). 


n(S)  =  r(S,R ) 

=  min  x(T) 

T-.SCTCR 


v(T) 


=  min  (x(S)  +  x(T  DQj)  +  x(T  n  Qk) 

T:SCTCR  \ 

—  v(S)  —  v(cut(S,  T  n  Qj)  —  u(cut (S,  T  n  Qk)^j 
=  T-SCTCR  n  Qi'l  -  V(CUt(S’  T  n  Qj))) 

+  t  mm_  (x(T  C\Qk)  —  v(cut (S,  T  n  Qk))) 

+  (x(S)-v(S))  (*) 

=  T-SCTCR  iX(T  0  +  ~  W(CUt(S'’  T  0  Qj))  ~  V(S)) 

+  ^  jnm_  (x(T  n  Qk)  +  x(S)  —  v(cut(S,  T  n  Qk))  —  v(S)) 
-(x(S)-v(S)) 

=  min  e(T  n  Rj)  +  min  e(T  n  Rk)  —  e(S) 

T-.SCTCR  T-.SCTCR 

—  min  e(T')  +  min  e(T")  —  e(S ) 

T'-.SCT'CRj  T"-.SCTCRk 

—  rj(S)  +  rk(S)  —  e(S) 


where  (*)  is  true  as  T  Pi  Qj  and  T  Pi  Qk  are  disjoint  due 
to  the  running  intersection  property  of  tree  decomposition, 
and  hence  the  minimum  of  the  sum  can  be  decomposed  into 
the  sum  of  the  minima.  □ 
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Abstract.  Hard  computational  problems  are  often  solvable  by  multiple  algo¬ 
rithms  that  each  perfonn  well  on  different  problem  instances.  We  describe  tech¬ 
niques  for  building  an  algorithm  portfolio  that  can  outperfonn  its  constituent  al¬ 
gorithms,  just  as  the  aggregate  classi  ers  learned  by  boosting  outperfonn  the  clas- 
si  ers  of  which  they  are  composed.  We  also  provide  a  method  for  generating  test 
distributions  to  focus  future  algorithm  design  work  on  problems  that  are  hard  for 
an  existing  portfolio.  We  demonstrate  the  effectiveness  of  our  techniques  on  the 
combinatorial  auction  winner  detennination  problem,  showing  that  our  portfolio 
outperforms  the  state-of-the-art  algorithm  by  a  factor  of  three.1 


1  Introduction 

Although  some  algorithms  are  better  than  others  on  average,  there  is  rarely  a  best  al¬ 
gorithm  for  a  given  problem.  Instead,  it  is  often  the  case  that  different  algorithms  per¬ 
form  well  on  different  problem  instances.  Not  surprisingly,  this  phenomenon  is  most 
pronounced  among  algorithms  for  solving  TV'P-Hard  problems,  because  runtimes  for 
these  algorithms  are  often  highly  variable  from  instance  to  instance.  When  algorithms 
exhibit  high  runtime  variance,  one  is  faced  with  the  problem  of  deciding  which  algo¬ 
rithm  to  use;  in  1976  Rice  dubbed  this  the  “algorithm  selection  problem”  [13].  In  the 
nearly  three  decades  that  have  followed,  the  issue  of  algorithm  selection  has  failed  to 
receive  widespread  study,  though  of  course  some  excellent  work  does  exist.  By  far, 
the  most  common  approach  to  algorithm  selection  has  been  to  measure  different  algo¬ 
rithms’  performance  on  a  given  problem  distribution,  and  then  to  use  only  the  algorithm 
having  the  lowest  average  runtime.  This  approach,  to  which  we  refer  as  “winner-take- 
all”,  has  driven  recent  advances  in  algorithm  design  and  re  nement,  but  has  resulted 
in  the  neglect  of  many  algorithms  that,  while  uncompetitive  on  average,  offer  excel¬ 
lent  performance  on  particular  problem  instances.  Our  consideration  of  the  algorithm 
selection  literature,  and  our  dissatisfaction  with  the  winner-take-all  approach,  has  led 
us  to  ask  the  following  two  questions.  First,  what  general  techniques  can  we  use  to 
perform  per-instance  (rather  than  per-distribution)  algorithm  selection?  Second,  once 
we  have  rejected  the  notion  of  winner-take-all  algorithm  evaluation,  how  ought  novel 
algorithms  to  be  evaluated?  Taking  the  idea  of  boosting  from  machine  learning  as  our 
guiding  metaphor,  we  strive  to  answer  both  questions. 

1  This  work  has  previously  been  published  as  a  two-page  extended  abstract  [9]. 


118 


*  This  work  is  generously  supported  by  DARPA  grant  F30602-00- 2-0598. 


1.1  The  Boosting  Metaphor 

Boosting  is  a  machine  learning  paradigm  due  to  Schapire  [17]  and  widely  studied  since. 
Although  this  paper  does  not  make  use  of  any  technical  results  from  the  boosting  lit¬ 
erature,  it  takes  its  inspiration  from  the  boosting  philosophy.  Stated  simply,  boosting  is 
based  on  two  insights: 

1.  Poor  classi  ers  can  be  combined  to  form  an  accurate  ensemble  when  the  classi  ers’ 
areas  of  effectiveness  are  suf  ciently  uncorrelated. 

2.  New  classi  ers  should  be  trained  on  problems  on  which  the  current  aggregate  clas¬ 
si  er  performs  poorly. 

In  this  paper,  we  argue  that  algorithm  design  should  be  informed  by  two  analogous 
ideas: 

1.  Algorithms  with  high  average  running  times  can  be  combined  to  form  an  algorithm 
portfolio  with  low  average  running  time  when  the  algorithms’  easy  inputs  are  suf- 
ciently  uncorrelated. 

2.  New  algorithm  design  should  focus  on  problems  on  which  the  current  algorithm 
portfolio  performs  poorly. 

Of  course  the  analogy  to  boosting  is  imperfect;  we  discuss  differences  in  section  5. 

1.2  Case  Study:  Combinatorial  Auctions  (Weighted  Set  Packing) 

To  discuss  the  effectiveness  of  an  algorithm  design  methodology,  it  is  necessary  to  per¬ 
form  a  case  study.  We  chose  to  consider  the  combinatorial  auction  winner  determination 
problem  ( WDP),  and  made  use  of  runtime  prediction  techniques  and  runtime  data  from 
our  previous  work  [10].  However,  it  must  be  emphasized  that  none  of  the  techniques 
we  propose  here  are  particular  to  this  problem  domain.  The  full  version  of  this  paper 
will  also  consider  other  domains;  in  particular,  we  have  had  some  positive  initial  results 
building  portfolios  for  SAT. 

Combinatorial  auctions  provide  a  general  framework  for  allocation  problems  among 
self-interested  agents  by  allowing  bids  for  bundles  of  goods.  WDP  is  a  weighted  set 
packing  problem  (SPP):  the  goal  is  to  choose  a  non-con  ictin  g  subset  of  bids  maxi¬ 
mizing  the  seller’s  revenue.  SPP  is  .A/P-Completc.  and  also  inapproximable  within  a 
constant  factor  (cf.  [15]).  Let  n  be  the  number  of  goods,  and  to  be  the  number  of  bids. 
A  bid  is  a  pair  <  S) ,  p,  >,  where  S)  C  {1, . . . ,  n}  is  the  set  of  goods  requested  by 
bid  i,  and  p,  is  that  bid’s  price  offer.  WDP  can  be  formulated  as  the  following  integer 
program: 


m 


maximize:  Xjpi 

2  =  1 

subject  to:  ^  ay  <  1 

v# 

i\g€St 

Xi  €  {0, 1} 

Vi 
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We  consider  three  algorithms  for  solving  WDP:  ILOG’s  CPLEX  package;GL  (Gonen- 
Lehmann)  [5],  a  simple  branch-and-bound  algorithm  with  CPLEX’s  LP  solver  as  its 
heuristic;  and  CASS  [2],  a  more  complex  branch-and-bound  algorithm  with  a  non-LP 
heuristic.  Unfortunately,  we  were  unable  to  get  access  to  CABOB  [16],  another  widely- 
cited  WDP  algorithm. 

1.3  Overview 

In  the  next  three  sections  we  give  general  methods  for  our  boosting  analogy  in  algorithm 
design.  In  section  2  we  present  a  methodology  for  constructing  algorithm  portfolios  and 
show  some  results  from  our  case  study.  We  go  on  in  section  3  to  offer  practical  exten¬ 
sions  to  our  methodology,  including  techniques  for  avoiding  the  computation  of  costly 
features,  trading  off  between  accuracy  on  hard  and  easy  instances,  and  building  models 
when  runtime  data  is  capped  at  some  maximum  running  time.  In  section  4  we  consider 
the  empirical  evaluation  of  portfolios,  and  describe  a  method  for  using  a  learned  model 
of  runtime  to  generate  a  test  distribution  that  will  be  hard  for  a  portfolio.  Similar  tech¬ 
niques  can  be  used  to  generate  instances  that  score  highly  on  a  given  “realism”  metric. 
Finally,  section  5  discusses  our  design  choices  and  compares  them  to  the  choices  made 
in  related  work. 

2  Algorithm  Portfolios 

Our  previous  work  [10]  demonstrated  that  statistical  regression  can  be  used  to  learn 
surprisingly  accurate  algorithm-speci  c  models  of  the  empirical  hardness  of  given  dis¬ 
tributions  of  problem  instances.  In  short,  the  method  proposed  in  that  work  is: 

1.  Use  domain  knowledge  to  select  features  of  problem  instances  that  might  be  in¬ 
dicative  of  runtime. 

2.  Generate  a  set  of  problem  instances  from  the  given  distribution,  and  collect  runtime 
data  for  the  algorithm  on  each  instance. 

3.  Use  regression  to  learn  a  real-valued  function  of  the  features  that  predicts  runtime. 

Given  this  existing  technique  for  predicting  runtime,  we  now  propose  building  port¬ 
folios  of  multiple  algorithms  as  follows: 

1.  Train  a  model  for  each  algorithm,  as  described  above. 

2.  Given  an  instance: 

(a)  Compute  feature  values 

(b)  Predict  each  algorithm’s  running  time  using  runtime  models 

(c)  Run  the  algorithm  predicted  to  be  fastest 

This  technique  is  powerful,  but  deceptively  simple.  For  discussion  and  comparison 
with  other  approaches  in  the  literature,  please  see  section  5.1.  As  we  will  demonstrate 
in  our  case  study,  such  portfolios  can  dramatically  outperform  the  algorithms  of  which 
they  are  composed. 
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Fig.  1.  Gross  Hardness  for  CPLEX  (from  [10]) 


Fig.  2.  Mean  Absolute  Error  (from  [10]) 


2.1  Case  Study:  Experimental  Setup 

We  performed  our  case  study  using  data  collected  in  our  past  work  [10],  which  we  recap 
brie  y  in  this  section.  All  of  our  results  focused  on  problems  of  a  x  ed  size:  numbers  of 
goods  and  non-dominated  bids  were  held  constant  to  256  and  1000  respectively.2  Our 
instance  distribution  involved  making  a  uniform  choice  between  nine  of  the  distribu¬ 
tions  from  the  Combinatorial  Auction  Test  Suite  (CATS)  [11],  and  randomly  choosing 
parameters  for  each  instance.  The  complete  dataset  was  composed  of  about  4500  in¬ 
stances.  For  each  instance  we  collected  runtime  data  for  CPLEX  7.1,  and  computed  25 
features  that  fall  roughly  into  four  categories: 

1.  Norms  of  the  linear  programming  slack  vector  (integrality  of  the  LP  relaxation  of 
the  IP) 

2.  Deviations  of  prices 

3.  Node  statistics  of  the  Bid-Good  bipartite  graph 

4.  Various  statistics  of  the  Bid  graph  (effectively,  the  problem’s  constraint  graph) 

All  data  was  collected  on  550  MHz  Pentium  Xeon  machines  running  Linux  2.2; 
over  3  years  of  CPU  time  was  spent  gathering  this  data.  Fig.  1  shows  a  3D  histogram 
of  the  distribution  of  hard  instances  across  our  dataset.  Observe  that  CPLEX’s  runtime 
varied  by  seven  orders  of  magnitude  even  though  the  number  of  goods  and  bids  was 
held  constant.  Also,  there  is  considerable  variation  within  most  of  the  distributions. 

Using  quadratic  regression,  we  were  able  to  build  very  accurate  models  of  the  loga¬ 
rithm  of  runtime.  Fig.  2  shows  a  histogram  of  the  mean  absolute  error  in  predicting  the 
log  of  CPLEX’s  runtime  observed  on  test  set  instances.  Since  our  methodology  relies 

2  In  a  separate  research  effort,  we  are  in  the  process  of  extending  the  work  from  [10]  to  models 
of  variable  problem  size;  when  these  models  become  available  it  will  be  possible  to  extend  the 
techniques  presented  in  this  paper  without  any  modi  cation. 
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on  machine  learning,  we  split  the  data  into  training,  validation,  and  test  sets.  We  report 
our  portfolio  runtimes  only  on  the  test  set  that  was  never  used  to  train  or  evaluate  mod¬ 
els.  An  error  of  1  in  predicting  the  log  means  that  runtime  was  mispredicted  by  a  factor 
of  10,  or  roughly  that  an  instance  was  misclassi  ed  by  one  of  the  bins  in  Fig.  1;  observe 
that  nearly  all  prediction  errors  are  less  than  1 . 
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Fig.  3.  Algorithm  and  Portfolio  Runtimes 


Fig.  4.  Optimal  Fig.  5.  Selected 


2.2  Case  Study  Results 

We  now  turn  to  new  results.  First,  we  used  the  methodolody  described  in  section  2.1 
to  build  regression  models  for  two  new  algorithms  (GL  and  CASS).  Fig.  3  compares 
the  average  runtimes  of  our  three  algorithms  (CPLEX,  CASS,  GL)  to  that  of  the  portfo¬ 
lio3.  Note  that  CPLEX  would  be  chosen  under  winner-take-all  algorithm  selection.  The 
‘■‘optimal'’  bar  shows  the  performance  of  an  ideal  portfolio  where  algorithm  selection  is 
performed  perfectly  and  with  no  overhead.  The  portfolio  bar  shows  the  time  taken  to 
compute  features  (light  portion)  and  the  time  taken  to  run  the  selected  algorithm  (dark 
portion).  Despite  the  fact  that  CASS  and  GL  are  much  slower  than  CPLEX  on  average, 
the  portfolio  outperforms  CPLEX  by  roughly  a  factor  of  3.  Moreover,  neglecting  the 

3  Note  the  change  of  scale  on  the  graph,  and  the  repeated  CPLEX  bar 
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cost  of  computing  features,  our  portfolio’s  selections  take  only  5%  longer  to  run  than 
the  optimal  selections. 

Figs.  4  and  5  show  the  frequency  with  which  each  algorithm  is  selected  in  the  ideal 
portfolio  and  in  our  portfolio.  They  illustrate  the  quality  of  our  algorithm  selection  and 
the  relative  value  of  the  three  algorithms.  Observe  that  our  portfolio  does  not  always 
make  the  right  choice  (in  particular,  it  selects  GL  much  more  often  than  it  should). 
However,  most  of  the  mistakes  made  by  our  models  occur  when  both  algorithms  have 
very  similar  running  times;  these  mistakes  are  not  very  costly,  explaining  why  our  port¬ 
folio’s  choices  have  a  running  time  so  close  to  optimal. 

These  results  show  that  our  portfolio  methodology  can  work  very  well  even  with  a 
small  number  of  algorithms,  and  when  one  algorithm’s  average  performance  is  consid¬ 
erably  better  than  the  others’.  We  suspect  that  our  techniques  could  work  even  better  in 
other  settings. 

3  Extending  our  Portfolio  Methodology 

Once  it  has  been  demonstrated  that  algorithm  portfolios  can  offer  signi  cant  speedups 
over  winner-take-all  algorithm  selection,  it  is  worthwhile  to  consider  modi  cations  to 
the  methodology  that  make  it  more  useful  in  practice.  Speci  cally,  we  describe  methods 
for  reducing  the  amount  of  time  spent  computing  features,  transforming  the  response 
variable,  and  capping  runs  of  some  or  all  algorithms. 

3.1  Smart  Feature  Computation 

Feature  values  must  be  computed  before  the  portfolio  can  choose  an  algorithm  to  run. 
We  expect  that  portfolios  will  be  most  useful  when  they  combine  several  exponential¬ 
time  algorithms  having  high  runtime  variance,  and  that  fast  polynomial-time  features 
should  be  sufcient  for  most  models.  Nevertheless,  on  some  instances  the  computa¬ 
tion  of  individual  features  may  take  substantially  longer  than  one  or  even  all  algorithms 
would  take  to  run.  In  such  cases  it  would  be  desirable  to  perform  algorithm  selection 
without  spending  as  much  time  computing  features,  even  at  the  expense  of  some  accu¬ 
racy  in  choosing  the  fastest  algorithm.  In  order  to  achieve  this,  we  partition  the  features 
into  sets  ordered  by  time  complexity,  Si,...,  Si,  with  i  >  j  implying  that  each  feature 
in  Si  takes  signi  cantly  longer  to  compute  than  each  feature  in  Sj.4  The  portfolio  can 
start  by  computing  the  easiest  features,  and  iteratively  compute  the  next  set  only  if  the 
expected  bene  t  to  selection  exceeds  the  cost  of  computation.  More  precisely: 

1.  For  each  set  Sj  learn  or  provide  a  model  c(Sj)  that  estimates  time  required  to 
compute  it.  Often,  this  could  be  a  simple  average  time  scaled  by  input  size. 

2.  Divide  the  training  examples  into  two  sets.  Using  the  rst  set,  train  models  M{  . . .  M}', 
with  MJ,  predicting  algorithm  i’s  runtime  using  features  in  Uj=i  Sj.  Note  that  M(l 

is  the  same  as  the  model  for  algorithm  i  in  our  basic  portfolio  methodology.  Let 
Mfc  be  a  portfolio  which  selects  argmin^  M£. 

4  We  assume  here  that  features  will  have  low  runtime  variance.  We  have  found  this  assumption 
to  hold  in  our  case  study.  If  feature  runtime  variance  makes  it  dif  cult  to  partition  the  features 
into  time  complexity  sets,  smart  feature  computation  is  more  dif  cult. 
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3.  Using  the  second  training  set,  learn  models  D i . . .  A-i-  with  Dk  predicting  the 
difference  in  runtime  between  the  algorithms  selected  by  Mk  and  Mfc+1  based  on 
Sk-  The  second  set  must  be  used  to  avoid  training  the  difference  models  on  data  to 
which  the  runtime  models  were  t. 

Given  an  instance  x,  the  portfolio  now  works  as  follows: 

4.  For  j  =  1  to  l 

(a)  Compute  features  in  S3 

(b)  \f  Dj[x\  >  c(Sj+i)[a;],  continue. 

(c)  Otherwise,  return  with  the  algorithm  predicted  to  be  fastest  according  to  Mj. 

3.2  Transforming  the  Response  Variable 

Average  runtime  is  an  obvious  measure  of  portfolio  performance  if  one’s  goal  is  to  min¬ 
imize  computation  time  over  a  large  number  of  instances.  Since  our  models  minimize 
root  mean  squared  error,  they  appropriately  penalize  20  seconds  of  error  equally  on  in¬ 
stances  that  take  1  second  or  10  hours  to  run.  However,  another  reasonable  goal  may  be 
to  perform  well  on  every  instance  regardless  of  its  hardness;  in  this  case,  relative  error 
is  more  appropriate.  Let  rf  and  r*  be  the  portfolio’s  runtime  and  the  optimal  runtime 
respectively  on  instance  i,  and  n  be  the  number  of  instances.  One  measure  that  gives  an 
insight  into  the  portfolio’s  relative  error  is  percent  optimal'. 


Another  measure  of  relative  error  is  average  percent  suboptimal : 


Taking  a  logarithm  of  runtime  is  a  simple  way  to  equalize  the  importance  of  relative 
error  on  easy  and  hard  instances.  Thus,  models  that  predict  a  log  of  runtime  help  to  im¬ 
prove  the  average  percent  suboptimal,  albeit  at  some  expense  in  terms  of  the  portfolio’s 
average  runtime.  In  Figure  6  (overleaf)  we  show  three  different  functions;  linear  (iden¬ 
tity)  and  log  are  the  extreme  values;  clearly,  many  functions  can  fall  in  between.  The 
functions  are  normalized  by  their  maximum  value,  since  this  does  not  affect  regression, 
but  allows  us  to  better  visualize  their  effect.  In  our  case  study  (section  3.4)  we  found 
that  the  cube  root  function  was  particularly  effective. 

3.3  Capping  Runs 

The  methodology  of  section  2  requires  gathering  runtime  data  for  every  algorithm  on 
every  problem  instance  in  the  training  set.  While  the  time  cost  of  this  step  is  fundamen¬ 
tally  unavoidable  for  our  approach,  gathering  perfect  data  for  every  instance  can  take  an 
unreasonably  long  time.  For  example,  if  algorithm  a\  is  usually  much  slower  than  a 2 
but  in  some  cases  dramatically  outperforms  0,2,  a  perfect  model  of  ai’s  runtime  on  hard 
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instances  may  not  be  needed  to  discriminate  between  the  two  algorithms.  The  process 
of  gathering  data  can  be  made  much  easier  by  capping  the  runtime  of  each  algorithm  at 
some  maximum  and  recording  these  runs  as  having  terminated  at  the  captime.  This  ap¬ 
proach  is  safe  if  the  captime  is  chosen  so  that  it  is  (almost)  always  signi  candy  greater 
than  the  minimum  of  the  algorithms’  runtimes;  if  not,  it  might  still  be  preferable  to  sac- 
ri  ce  some  predictive  accuracy  for  dramatically  reduced  model-building  time.  Note  that 
if  any  algorithm  is  capped,  it  can  be  dangerous  (particularly  without  a  log  transforma¬ 
tion)  to  gather  data  for  any  other  algorithm  without  capping  at  the  same  time,  because 
the  portfolio  could  inappropriately  select  the  algorithm  with  the  smaller  captime. 

3.4  Case  Study  Results 

Fig.  7  shows  the  performance  of  the  smart  feature  computation  discussed  in  section  3.1, 
with  the  upper  part  of  the  bar  indicating  the  time  spent  computing  features.  Compared 
to  computing  all  features,  we  reduce  overhead  by  almost  half  with  nearly  no  cost  in 
running  time. 


Runtime  (%  of  max) 


Fig.  6.  Transformation  F'ns  (Normalized) 


Fig.  7.  Smart  Features 


Average  Runtime 

%  Optimal 

Average  %  Suboptimal 

(Optimal) 

216.4  s 

100 

0 

Log 

236.5  s 

97 

9 

Cuberoot 

225.6  s 

89 

17 

Linear 

225.1  s 

81 

1284 

Table  1.  Portfolio  Results 


Table  1  shows  the  effect  of  our  response  variable  transformations  on  average  run¬ 
time,  percent  optimal  and  average  percent  suboptimal.  The  rst  row  has  results  that 
would  be  obtained  by  a  perfect  portfolio.  As  discussed  in  section  3.2,  the  linear  (iden¬ 
tity)  transformation  yields  the  best  average  runtime,  while  the  log  function  leads  to  bet¬ 
ter  algorithm  selection.  We  tried  several  transformation  functions  between  linear  and 
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log.  Here  we  only  show  the  best,  cube  root:  it  has  nearly  the  same  average  runtime 
performance  as  linear,  but  also  made  choices  nearly  as  accurately  as  log. 

4  Focused  Algorithm  Design 

Once  we  have  decided  to  select  among  existing  algorithms  using  a  portfolio  approach,  it 
is  necessary  to  reexamine  the  way  we  design  and  evaluate  algorithms.  Since  the  purpose 
of  designing  new  algorithms  is  to  reduce  the  time  that  it  will  take  to  solve  problems, 
designers  should  aim  to  produce  new  algorithms  that  complement  an  existing  portfolio. 
First,  it  is  essential  to  choose  a  distribution  D  that  re  ects  the  problems  that  will  be 
encountered  in  practice.  Given  a  portfolio,  the  greatest  opportunity  for  improvement  is 
on  instances  that  are  hard  for  that  portfolio,  common  in  D,  or  both.  More  precisely, 
the  importance  of  a  region  of  problem  space  is  proportional  to  the  amount  of  time  the 
current  portfolio  spends  working  on  instances  in  that  region.  This  is  analogous  to  the 
principle  from  boosting  that  new  classi  ers  should  be  trained  on  instances  that  are  hard 
for  the  existing  ensemble,  in  the  proportion  that  they  occur  in  the  original  training  set. 

4.1  Inducing  Hard  Distributions 

Let  Hf  be  a  model  of  portfolio  runtime  based  on  instance  features,  constructed  as  the 
minimum  of  the  models  that  constitute  the  portfolio.  By  normalizing,  we  can  reinter¬ 
pret  this  model  as  a  density  function  hf.  By  the  argument  above,  we  should  generate 
instances  from  the  product  of  this  distribution  and  our  original  distribution,  D.  However, 
it  is  problematic  to  sample  from  D  -hf\  D  may  be  non-analytic  (an  instance  generator), 
while  h f  depends  on  features  and  so  can  only  be  evaluated  after  an  instance  has  been 
created. 

One  way  to  sample  from  D  ■  hf  is  rejection  sampling  [1]:  generate  problems  from 
D  and  keep  them  with  probability  proportional  to  hf.  This  method  works  best  when 
another  distribution  is  available  to  guide  the  sampling  process  toward  hard  instances. 
Test  distributions  usually  have  some  tunable  parameters  p  ,  and  although  the  hardness 
of  instances  generated  with  the  same  parameter  values  can  vary  widely,  p  will  often 
be  somewhat  predictive  of  hardness.  We  can  generate  instances  from  D  ■  hf  in  the 
following  way:5 

1.  Create  a  hardness  model  Hp  with  features  ~p  ,  and  normalize  it  to  create  a  pdf,  hp. 

2.  Generate  a  large  number  of  instances  from  D  ■  hp. 

3.  Construct  a  distribution  over  instances  by  assigning  each  instance  s  probability 
proportional  to  jpjjy,  and  select  an  instance  by  sampling  from  this  distribution. 

Observe  that  if  h.p  turns  out  to  be  helpful,  hard  instances  from  D  ■  hf  will  be  en¬ 
countered  quickly.  Even  in  the  worst  case  where  hp  directs  the  search  away  from  hard 

J  In  true  rejection  sampling  step  2  would  generate  a  single  instance  that  would  be  then  accepted 
or  rejected  in  step  3.  Our  technique  approximates  this  process,  but  doesn’t  require  us  to  nor¬ 
malize  Hf  and  allows  us  to  output  an  instance  after  generating  a  constant  number  of  samples. 
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instances,  observe  that  we  still  sample  from  the  correct  distribution  because  the  weights 
are  divided  by  hp(s). 

In  practice,  D  may  be  factored  as  Dg  ■  DPi ,  where  Dg  is  a  distribution  over  otherwise 
unrelated  instance  generators  with  different  parameter  spaces,  and  DPi  is  a  distribution 
over  the  parameters  of  the  chosen  instance  generator  i.  In  this  case  it  is  difcult  to 
learn  a  single  Hp.  A  good  solution  is  to  factor  hp  as  hg  ■  hPi,  where  hg  is  a  hardness 
model  using  only  the  choice  of  instance  generator  as  a  feature,  and  hPi  is  a  hardness 
model  in  instance  generator  i’s  parameter  space.  Likewise,  instead  of  using  a  single 
feature-space  hardness  model  Hf,  we  can  train  a  separate  model  for  each  generator 
Hfj  and  normalize  each  to  a  pdf  h  f  ,  /'  The  goal  is  now  to  generate  instances  from  the 
distribution  Dg  ■  DPi  ■  hfp,  which  can  be  done  as  follows: 

1.  For  every  instance  generator  i,  create  a  hardness  model  HPi  with  features  pi,  and 
normalize  it  to  create  a  pdf,  hPi . 

2.  Construct  a  distribution  over  instance  generators  hg,  where  the  probability  of  each 
generator  i  is  proportional  to  the  average  hardness  of  instances  generated  by  i. 

3.  Generate  a  large  number  of  instances  from  ( Dg  ■  hg)  ■  ( DPi  ■  hPi) 

(a)  select  a  generator  i  by  sampling  from  Dg  ■  hg 

(b)  select  parameters  for  the  generator  by  sampling  from  DPi  ■  hPi 

(c)  run  generator  i  with  the  chosen  parameters  to  generate  an  instance. 

4.  Construct  a  distribution  over  instances  by  assigning  each  instance  s  from  generator 

i  probability  proportional  to  h  and  select  an  instance  by  sampling  from 

this  distribution. 

4.2  Inducing  Realistic  Distributions 

It  is  important  for  our  portfolio  methodology  that  we  begin  with  a  “realistic”  D\  that 
is,  a  distribution  accurately  re  ecting  the  sorts  of  problems  expected  to  occur  in  prac¬ 
tice.  Care  must  always  be  taken  to  construct  a  generator  or  set  of  generators  producing 
instances  that  are  representative  of  problems  from  the  target  domain.  Sometimes,  it  is 
possible  to  construct  a  function  R  f  that  scores  the  realism  of  a  generated  instance  based 
on  features  of  that  instance;  such  a  function  can  encode  additional  information  about 
the  nature  of  realistic  data  that  cannot  easily  be  expressed  in  a  generator.  If  a  function 
R f  is  provided,  we  can  construct  D  from  a  parameterized  set  of  instance  generators  by 
using  Rf  in  place  of  Hf  above  and  learning  rp  in  the  same  way  we  learned  hp.  This 
can  allow  us  to  make  informed  choices  when  setting  the  parameters  of  instance  gener¬ 
ators,  and  also  to  discard  less  realistic  data  after  it  has  been  generated.  Note  that  when 
inducing  hard  distributions  a  hardness  model  had  to  be  used  because  it  was  infeasible 
to  score  each  sample  by  actual  portfolio  runtime.  In  the  case  of  inducing  realistic  dis¬ 
tributions  this  is  no  longer  a  problem,  because  the  realism  function  can  be  evaluated 
on  each  sample.  Therefore,  our  rejection  sampling  technique  is  guaranteed  to  generate 
instances  with  increased  average  realism  scores.  The  use  of  parameter-space  models  rp 
can  still  improve  performance  by  reducing  the  number  of  samples  needed  for  obtaining 
good  results. 

6  However,  the  case  study  results  presented  in  gs.  8-10  use  hardness  models  Hf  trained  on 
the  whole  dataset  rather  than  using  models  trained  on  individual  distributions.  Learning  new 
models  would  probably  yield  even  better  results. 
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Fig.  9.  Matching 


Fig.  10.  Scheduling 


4.3  Case  Study  Results 

Due  to  the  wide  spread  of  runtimes  in  our  composite  distribution  D  (7  orders  of  mag¬ 
nitude)  and  the  high  accuracy  of  our  model  hf  [10],  it  is  quite  easy  for  our  technique 
to  generate  harder  instances.  These  results  are  presented  in  g.  8.  Because  our  runtime 
data  was  capped,  there  is  no  way  to  know  if  the  hardest  instances  in  the  new  distri¬ 
bution  are  harder  than  the  hardest  instances  in  the  original  distribution;  note,  however, 
that  very  few  easy  instances  are  generated.  Instances  in  the  induced  distribution  came 
predominantly  from  the  CATS  “arbitrary”  distribution,  with  most  of  the  rest  from  “L3”. 

To  demonstrate  that  our  technique  also  works  in  more  challenging  settings,  we 
sought  a  different  distribution  with  small  runtime  variance.  As  it  happens,  there  has 
been  ongoing  discussion  in  the  WDP  literature  about  whether  those  CATS  distributions 
[11]  that  are  relatively  easy  could  be  con  gured  to  be  harder  (see  e.g.,  [4, 16]).  We 
consider  two  easy  distributions  with  low  variance  from  CATS,  matching  and  schedul¬ 
ing,  and  show  that  they  indeed  can  be  made  harder  than  originally  proposed.  Figures 
9  and  10  show  the  histograms  of  the  runtimes  of  the  ideal  portfolio  before  and  after 
our  technique  was  applied.  In  fact,  for  these  two  distributions  we  generated  instances 
that  were  (respectively)  100  and  50  times  harder  than  anything  we  had  previously  seen! 
Moreover,  the  average  runtime  for  the  new  distributions  was  greater  than  the  observed 
maximum  running  time  on  the  original  distribution. 
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5  Discussion  and  Related  Work 


Although  it  is  helpful,  our  analogy  to  boosting  is  clearly  not  perfect.  One  key  differ¬ 
ence  lies  in  the  way  components  are  aggregated:  classi  ers  can  be  combined  through 
majority  voting,  whereas  the  whole  point  of  algorithm  selection  is  to  run  only  a  single 
algorithm.  We  instead  advocate  the  use  of  learned  models  of  runtime  as  the  basis  for 
algorithm  selection,  which  leads  to  another  important  difference.  It  is  not  enough  for 
the  easy  problems  of  multiple  algorithms  to  be  uncorrelated;  the  models  must  also  be 
accurate  enough  to  reliably  recommend  against  the  slower  algorithms  on  these  uncor¬ 
related  instances.  Finally,  while  it  is  impossible  to  improve  on  correctly  classifying  an 
instance,  it  is  almost  always  possible  to  solve  a  problem  instance  more  quickly.  Thus 
improvement  is  possible  on  easy  instances  as  well  as  on  hard  instances;  the  analogy  to 
boosting  holds  in  the  sense  that  focusing  on  hard  regions  of  the  problem  space  increases 
the  potential  gain  in  terms  of  reduced  average  portfolio  runtimes. 

5.1  Algorithm  Selection 

It  has  long  been  understood  that  algorithm  performance  can  vary  substantially  across 
different  classes  of  problems.  Rice  [13]  was  the  rst  to  formalize  algorithm  selection 
as  a  computational  problem,  framing  it  in  terms  of  function  approximation.  Broadly,  he 
identi  ed  the  goal  of  selecting  a  mapping  S(x)  from  the  space  of  instances  to  the  space 
of  algorithms,  to  maximize  some  performance  measure  perf(,S(x),  x).  Rice  offered  few 
concrete  techniques,  but  all  subsequent  work  on  algorithm  selection  can  be  seen  as 
falling  into  his  framework.  We  explain  our  choice  of  methodology  by  relating  it  to 
other  approaches  for  algorithm  selection  that  have  been  proposed  in  the  literature. 


Parallel  Execution  One  tempting  alternative  to  portfolios  that  select  a  single  algo¬ 
rithm  is  the  parallel  execution  of  all  available  algorithms.  While  it  is  often  true  that 
additional  processors  are  readily  available,  it  is  also  often  the  case  that  these  processors 
can  be  put  to  uses  besides  running  different  algorithms  in  parallel,  such  as  paralleliz¬ 
ing  a  single  search  algorithm  or  solving  multiple  problem  instances  at  the  same  time. 
Meaningful  comparisons  of  running  time  between  parallel  and  non-parallel  portfolios 
require  that  computational  resources  be  x  ed,  with  parallel  execution  modelled  as  ideal 
(no-overhead)  task  swapping  on  a  single  processor.  Let  **(*)  be  the  time  it  takes  to 
run  the  algorithm  that  is  fastest  on  instance  x,  and  let  n  be  the  number  of  algorithms. 
A  portfolio  that  executes  all  algorithms  in  parallel  on  instance  x  will  always  take  time 
nt*(x).  On  the  data  from  our  case  study  such  parallel  execution  has  roughly  the  same 
average  runtime  as  winner-take-all  algorithm  selection  (we  have  three  algorithms  and 
CPLEX  is  three  times  slower  than  the  optimal  portfolio),  while  our  techniques  do  much 
better,  achieving  running  times  of  roughly  1.05f*(at). 

In  some  domains,  parallel  execution  can  be  a  very  effective  technique.  Gomes  and 
Selman  [3]  proposed  such  an  approach  for  incomplete  SAT  algorithms,  using  the  term 
portfolio  to  describe  a  set  of  algorithms  run  in  parallel.  In  this  domain  runtime  depends 
heavily  on  variables  such  as  random  seed,  making  runtime  difcult  to  predict;  thus 
parallel  execution  is  likely  to  outperform  a  portfolio  that  chooses  a  single  algorithm. 
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In  such  cases  it  is  possible  to  extend  our  methodology  to  allow  for  parallel  execution. 
We  can  add  one  or  more  new  algorithms  to  our  portfolio,  with  algorithm  i  standing  as 
a  placeholder  for  the  parallel  execution  of  fc,;  of  the  original  algorithms;  in  the  training 
data  i  would  be  given  a  running  time  of  fc;  times  the  minimum  of  its  constituents.  This 
approach  would  allow  portfolios  to  choose  to  task-swap  sets  of  algorithms  in  parts  of  the 
feature  space  where  the  minimums  of  individual  algorithms’  runtimes  are  much  smaller 
than  their  means,  but  to  choose  single  algorithms  in  other  parts  of  the  feature  space. 
Our  use  of  the  term  “portfolio”  may  thus  be  seen  as  an  extension  of  the  term  coined  by 
Gomes  and  Selman,  referring  to  a  set  of  algorithms  and  a  strategy  for  selecting  a  subset 
(perhaps  one)  for  parallel  execution. 

Classi  cation  Since  algorithm  selection  is  fundamentally  discriminative — it  entails 
choosing  the  algorithm  that  will  exhibit  minimal  runtime — classi  cation  is  an  obvious 
approach  to  consider.  Any  standard  classi  cation  algorithm  (e.g.,  a  decision  tree)  could 
be  used  to  learn  which  algorithm  to  choose  given  features  of  the  instance  and  labelled 
training  examples.  The  problem  is  that  such  classi  cation  algorithms  use  the  wrong  er¬ 
ror  metric:  they  penalize  misclassi  cations  equally  regardless  of  their  cost.  We  want  to 
minimize  a  portfolio’s  average  runtime,  not  its  accuracy  in  choosing  the  optimal  algo¬ 
rithm.  Thus  we  should  penalize  misclassi  cations  more  when  the  difference  between 
the  runtimes  of  the  chosen  and  fastest  algorithms  is  large  than  when  it  is  small.  This  is 
just  what  happens  when  our  decision  criterion  is  to  select  the  smallest  prediction  among 
a  set  of  regression  models  that  were  t  to  minimize  root  mean  squared  error. 

A  second  classi  cation  approach  entails  dividing  running  times  into  two  or  more 
bins,  predicting  the  bin  that  contains  the  algorithm’s  runtime,  and  then  choosing  the 
best  algorithm.  For  example,  Horvitz  et.  al.  [6, 14]  used  classi  cation  to  predict  runtime 
of  CSP  and  SAT  solvers  with  inherently  high  runtime  variance  (heavy  tails).  Despite 
its  similarity  to  our  portfolio  methodology,  this  approach  suffers  from  the  use  of  a  clas¬ 
si  cation  algorithm  to  predict  runtime.  First,  the  learning  algorithm  does  not  use  an 
error  function  that  penalizes  large  misclassi  cations  (off  by  more  than  one  bin)  more 
heavily  than  small  misclassi  cations  (off  by  one  bin).  Second,  this  approach  is  unable 
to  discriminate  between  algorithms  when  multiple  predictions  fall  into  the  same  bin. 
Finally,  since  runtime  is  a  continuous  variable,  class  boundaries  are  arti  cial.  Instances 
with  runtimes  lying  very  close  to  a  boundary  are  likely  to  be  misclassi  ed  even  by  a 
very  accurate  model,  making  accurate  models  harder  to  learn. 


Markov  Decision  Processes  Perhaps  most  related  to  our  paper  is  work  by  Lagoudakis 
and  Littman  ([7,  8]).  They  worked  within  the  MDP  framework,  and  concentrated  on 
recursive  algorithms  (e.g.  sorting,  SAT),  sequentially  solving  the  algorithm  selection 
problem  on  each  subproblem.  This  work  demonstrates  encouraging  results;  however, 
its  generality  is  limited  by  several  factors.  First,  the  use  of  algorithm  selection  at  each 
stage  of  a  recursive  algorithm  can  require  extensive  recoding,  and  may  simply  be  impos¬ 
sible  with  ‘black-box’  commercial  or  proprietary  algorithms,  which  are  often  among  the 
most  competitive.  Second,  solving  the  algorithm  selection  problem  recursively  requires 
that  the  value  functions  be  very  inexpensive  to  compute;  in  our  case  study  we  found  that 
more  computationally  expensive  features  were  required  for  accurate  predictions  of  run- 
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time.  Finally,  these  techniques  can  be  undermined  by  non-Markovian  algorithms,  such 
as  those  using  clause  learning,  taboo  lists  or  other  forms  of  dynamic  programming. 

Of  course,  our  approach  can  also  be  described  in  an  MDP  framework,  with  each 
action  (choice  of  algorithm)  leading  to  a  terminal  state,  and  reward  equal  to  the  negative 
of  runtime.  Optimal  policy  selection  is  trivial  given  a  good  value  function;  thus  the 
key  to  success  is  good  value  estimation.  Our  approach  emphasizes  making  the  value 
functions — that  is,  models  of  runtime — explicit,  since  this  provides  the  best  defense 
against  good  but  fragile  policies.  We  do  not  describe  our  models  as  MDPs  because  the 
framework  is  redundant  in  the  absence  of  sequential  decision-making. 


Different  Regression  Approaches  Lobjois  and  LemaTre  [12]  select  among  several 
simple  branch-and-bound  algorithms  based  on  a  prediction  of  running  time.  This  work 
is  similar  in  spirit  to  our  own;  however,  their  prediction  is  based  on  a  single  feature  and 
works  only  on  a  particular  class  of  branch-and-bound  algorithms. 

Since  our  goal  is  to  discriminate  among  algorithms,  it  might  seem  more  appropriate 
to  learn  models  of  pairwise  differences  between  algorithm  runtimes,  rather  than  models 
of  absolute  runtimes.  For  linear  regression  (and  the  forms  of  nonlinear  regression  used 
in  our  work)  it  is  easy  to  show  that  the  two  approaches  are  mathematically  equivalent. 

5.2  Inducing  Hard  Distributions 

It  is  widely  recognized  that  the  choice  of  test  distribution  is  important  for  algorithm 
development.  In  the  absence  of  general  techniques  for  generating  instances  that  are  both 
realistic  and  hard,  the  development  of  new  distributions  has  usually  been  performed 
manually.  An  excellent  example  of  such  work  is  Selman  et.  al.  ([18]),  which  describes 
a  method  of  generating  SAT  instances  near  the  phase  transition  threshold,  which  are 
known  to  be  hard  for  most  SAT  solvers. 

6  Conclusions 

Just  as  boosting  allows  weak  classi  ers  to  work  together  effectively,  algorithms  can  be 
combined  into  portfolios  to  build  a  whole  greater  than  the  sum  of  its  parts.  First,  we 
have  described  how  to  build  such  portfolios.  Our  techniques  can  be  elaborated  to  re¬ 
duce  the  cost  of  computing  features,  to  reduce  the  time  spent  gathering  training  data 
by  capping  runs,  and  to  strike  the  right  balance  between  the  penalties  for  mispredict¬ 
ing  easy  and  hard  instances.  Second,  we  argued  that  algorithm  design  should  focus  on 
problem  instances  upon  which  a  portfolio  of  existing  algorithms  spends  most  of  its 
time.  We  have  provided  techniques  for  inducing  such  distributions,  and  also  for  re  ning 
distributions  to  emphasize  instances  that  have  high  scores  on  a  given  ‘realism’  func¬ 
tion.  We  performed  a  case  study  on  combinatorial  auctions,  and  showed  that  a  portfolio 
composed  of  CPLEX  and  two  older — and  generally  much  slower — algorithms  outper¬ 
formed  CPLEX  alone  by  about  a  factor  of  3.  In  future  work,  we  aim  to  perform  case 
studies  of  our  methodology  on  other  hard  problems;  our  rst  effort  in  this  direction  is  a 
portfolio  of  10  algorithms  which  we  have  entered  in  the  2003  SAT  competition. 


131 


Acknowledgments 


Thanks  to  Ryan  Porter,  Carla  Gomes  and  Bart  Selman  for  helpful  discussions.  This 

work  was  supported  by  DARPA  grant  F30602-00-2-0598,  the  Intelligent  Information 

Systems  Institute  at  Cornell,  and  a  Stanford  Graduate  Fellowship. 

References 

1.  A.  Doucet,  N.  de  Freitas,  and  N.  Gordon(ed-).  Sequential  Monte  Carlo  Methods  in  Practice. 
Springer- Verlag,  200 1 . 

2.  Y.  Fujishima,  K.  Leyton-Brown,  and  Y.  Shoham.  Taming  the  computational  complexity  of 
combinatorial  auctions:  Optimal  and  approximate  approaches.  In  IJCAI,  1999. 

3.  C.  Gomes  and  B.  Selman.  Algorithm  portfolios.  Arti  cial  Intelligence,  126( l-2):43 — 62, 

2001. 

4.  R.  Gonen  and  D.  Lehmann.  Optimal  solutions  for  multi-unit  combinatorial  auctions:  Branch 
and  bound  heuristics.  In  ACM  Conference  on  Electronic  Commerce,  2000. 

5.  R.  Gonen  and  D.  Lehmann.  Linear  programming  helps  solving  large  multi-unit  combi¬ 
natorial  auctions.  Technical  Report  TR-2001-8,  Leibniz  Center  for  Research  in  Computer 
Science,  April  200 1 . 

6.  E.  Horvitz,  Y.  Ruan,  C.  Gomes,  H.  Kautz,  B.  Selman,  and  M.  Chickering.  A  Bayesian 
approach  to  tackling  hard  computational  problems.  In  UAI,  2001. 

7.  M.  Lagoudakis  and  M.  Littman.  Algorithm  selection  using  reinforcement  learning.  In  ICML, 

2000. 

8.  M.  Lagoudakis  and  M.  Littman.  Learning  to  select  branching  rules  in  the  DPLL  procedure 
for  satis  ability  .  In  LICS/SAT,  2001. 

9.  K.  Leyton-Brown,  E.  Nudelman,  G.  Andrew,  J.  McFadden,  and  Y.  Shoham.  A  portfolio 
approach  to  algorithm  selection.  In  IJCAI,  2003. 

10.  K.  Leyton-Brown,  E.  Nudelman,  and  Y.  Shoham.  Learning  the  empirical  hardness  of  opti¬ 
mization  problems:  The  case  of  combinatorial  auctions.  In  CP,  2002. 

11.  K.  Leyton-Brown,  M.  Pearson,  and  Y.  Shoham.  Towards  a  universal  test  suite  for  combina¬ 
torial  auction  algorithms.  In  ACM  EC,  2000. 

12.  L.  Lobjois  and  M.  LemaTre.  Branch  and  bound  algorithm  selection  by  performance  predic¬ 
tion.  In  AAAI,  1998. 

13.  J.  R.  Rice.  The  algorithm  selection  problem.  Advances  in  Computers,  15:65-1 18,  1976. 

14.  Y.  Ruan,  E.  Horvitz,  and  H.  Kautz.  Restart  policies  with  dependence  among  runs:  A  dynamic 
programming  approach.  In  CP,  2002. 

15.  T.  Sandholm.  An  algorithm  for  optimal  winner  determination  in  combinatorial  auctions.  In 
IJCAI,  1999. 

16.  T.  Sandholm,  S.  Suri,  A.  Gilpin,  and  D.  Levine.  CABOB:  A  fast  optimal  algorithm  for 
combinatorial  auctions.  In  IJCAI,  2001. 

17.  R.  Schapire.  The  strength  of  weak  leamability.  Machine  Learning,  5:197-227,  1990. 

18.  B.  Selman,  D.  G.  Mitchell,  and  H.  J.  Levesque.  Generating  hard  satis  ability  problems. 
Arti  cial  Intelligence,  8 1(1  -2):  17 — 29,  1996. 


132 


SATzilla:  An  Algorithm  Portfolio  for  SAT* 


Eugene  Nudelman  Alex  Devkar  Yoav  Shoham 

Department  of  Computer  Science,  Stanford  University 

Kevin  Leyton-Brown  Holger  Hoos 

Department  of  Computer  Science,  University  of  British  Columbia 


1  Introduction 

Inspired  by  the  success  of  recent  work  in  the  con¬ 
straint  programming  community  on  typical-case 
complexity,  in  [3]  we  developed  a  new  method¬ 
ology  for  using  machine  learning  to  study  em¬ 
pirical  hardness  of  hard  problems  on  realistic 
distributions.  In  [2]  we  demonstrated  that  this 
new  approach  can  be  used  to  construct  practical 
algorithm  portfolios.  In  brief,  the  fact  that  algo¬ 
rithms  for  solving  AT-hard  problems  are  often 
relatively  uncorrelated  means  that  it  is  possi¬ 
ble  for  a  portfolio  to  outperform  all  of  its  con¬ 
stituent  algorithms.  However,  such  uncorrela¬ 
tion  is  a  knife  that  cuts  both  ways:  a  portfolio 
that  makes  bad  choices  among  its  constituent 
algorithms  will  often  have  much  worse  perfor¬ 
mance  than  any  of  its  constituent  algorithms. 

Our  methodology  can  be  outlined  as  follows: 

Offline,  as  part  of  algorithm  development: 

1.  Identify  a  target  distribution  of  problem 
instances. 

2.  Select  a  set  of  algorithms  having  relatively 
uncorrelated  runtimes  on  this  distribution. 

3.  Using  domain  knowledge,  identify  features 
that  characterize  problem  instances. 

4.  Compute  features  and  determine  algorithm 
running  times. 

5.  Use  regression  to  construct  models  of  al¬ 
gorithms’  runtimes. 

Online,  given  an  instance: 

1.  Compute  feature  values. 

2.  Predict  each  algorithm’s  running  time  us¬ 
ing  learned  runtime  models. 

3.  Run  the  algorithm  predicted  to  be  fastest. 

*See  [5]  for  a  complete  discussion  of  SATzilla 


2  SATzilla 

SATzilla  is  a  portfolio  of  SAT  solvers  built  ac¬ 
cording  to  the  methodology  described  above.  It 
includes  the  following  solvers:  2clseq,  Limmat, 
JeruSat,  OKsolver,  Relsat,  Sato,  Satz-rand, 
zChaff ,  eqSatz,  Satzoo,  kcnfs,  and  BerkMin. 

We  began  by  assembling  a  broad  library  of 
about  5000  SAT  instances,  which  we  gathered 
from  various  public  websites.  We  identified  83 
features  that  could  be  computed  quickly  and 
that  we  felt  might  be  useful  for  predicting  run¬ 
time.  We  computed  these  features  for  our  set 
of  SAT  instances,  dropped  some  features  that 
were  highly  correlated,  and  were  left  with  56 
distinct  features.  In  order  to  keep  feature  values 
to  sensible  ranges,  as  appropriate  we  normalized 
features  by  the  total  number  of  clauses  or  num¬ 
ber  of  variables.  We  also  computed  runtimes  for 
each  algorithm  on  each  of  our  SAT  instances. 
Given  our  features  and  runtime  data,  we  had 
a  well-defined  supervised  learning  problem.  We 
built  models  using  ridge  regression,  a  machine 
learning  technique  that  finds  a  linear  model  (a 
hyperplane  in  feature  space)  that  minimizes  a 
combination  of  root  mean  squared  error  and  a 
penalty  term  for  large  coefficients.  To  yield  bet¬ 
ter  models,  we  ignored  all  instances  that  were 
solved  by  all  algorithms,  by  no  algorithms,  or  as 
a  side-effect  of  feature  computation. 

Upon  execution,  SATzilla  begins  by  run¬ 
ning  a  UBCSAT  [6]  implementation  of  WalkSat 
for  30  seconds.  In  our  experience,  this  step  helps 
to  filter  out  easy  satisfiable  instances.  Next, 
SATzilla  runs  the  Hyprefl]  preprocessor,  which 
uses  hyper-resolution  to  reason  about  binary  clauses 
This  step  is  often  able  to  dramatically  shorten 
the  formula,  often  resulting  in  search  problems 
that  are  easier  for  DPLL-style  solvers.  Perhaps 
more  importantly,  the  simplification  “cleans  up” 
instances,  allowing  the  subsequent  analysis  of 
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SAT  Competition  2004  -  solver  description 


their  structure  to  better  reflect  the  problem’s 
combinatorial  “core.”  Third,  SATzilla  com¬ 
putes  its  56  features.  Sometimes,  a  feature  can 
actually  solve  the  problem;  if  this  occurs,  exe¬ 
cution  stops.  Some  features  can  also  take  an  in¬ 
ordinate  amount  of  time,  particularly  with  very 
large  inputs.  To  prevent  feature  computation 
from  consuming  all  of  our  allotted  time,  certain 
features  run  only  until  a  timeout  is  reached,  at 
which  point  SATzilla  gives  up  on  computing 
the  given  feature.  Fourth,  SATzilla  evaluates 
a  regression  model  of  each  algorithm  in  order  to 
compute  a  prediction  of  that  algorithm’s  run¬ 
ning  time.  If  some  of  the  features  have  timed 
out,  a  different  model  is  used,  which  does  not  in¬ 
volve  the  missing  feature  and  which  was  trained 
only  on  instances  where  the  same  feature  timed 
out.  Finally,  SATzilla  runs  the  algorithm  with 
the  best  predicted  runtime  until  the  instance  is 
solved  or  the  allotted  time  is  used  up. 

3  Features 

Space  restrictions  prevent  us  from  going  into 
great  detail  about  all  elements  of  SATzilla.  We 
choose  to  use  our  remaining  space  to  give  an 
overview  of  the  features  used  by  SATzilla. 

The  features  can  be  roughly  categorized  into 
9  groups.  The  first  one  captures  problem  size, 
measured  in  the  number  of  clauses,  variables, 
and  their  ratio.  The  next  three  groups  corre¬ 
spond  to  3  different  constraint  graphs  associ¬ 
ated  with  each  SAT  instances.  Variable- Clause 
Graph  is  a  bipartite  graph  representing  which 
variables  participate  in  which  clause.  Variable 
Graph  has  nodes  representing  variables,  and  an 
edge  between  any  variables  that  occur  in  a  clause 
together.  Conflict  Graph  (CG)  has  nodes  repre¬ 
senting  clauses,  and  an  edge  between  two  clauses 
whenever  they  share  a  negated  literal.  For  all 
graphs  we  compute  various  node  degree  statis¬ 
tics.  For  CG  we  also  compute  statistics  of  clus¬ 
tering  coefficients,  defined,  for  each  node,  as  the 
number  of  edges  among  its  neighbors  divided  by 
k(k  —  l)/2,  where  k  is  the  number  of  neighbors. 
The  fifth  group  measures  the  balance  of  an  in¬ 
stance  in  several  different  respects:  the  number 
of  unary,  binary,  and  ternary  clauses,  statistics 
of  the  amount  of  positive  versus  negative  occur¬ 
rences  of  variables  within  clauses  and  per  vari¬ 
able.  The  sixth  group  measures  the  proximity 
of  the  instance  to  a  Horn  formula  by  computing 
the  fraction  of  clauses  that  are  Horn,  and  statis¬ 


tics  over  variables  occurring  in  a  Horn  clause. 
These  groups  are  motivated  by  known  heuris¬ 
tics  and  tractable  subclasses.  The  seventh  group 
of  features  is  obtained  by  solving  a  linear  pro¬ 
gramming  relaxation  of  an  integer  program  rep¬ 
resenting  the  current  SAT  instance  (a  feature 
that  can  sometimes  solve  the  SAT  instance).  Of¬ 
ten,  for  integer  programs,  proximity  of  the  LP 
relaxation  solution  to  an  integral  solution  is  an¬ 
ticorrelated  with  hardness.  We  compute  statis¬ 
tics  of  the  integer  slacks,  as  well  as  the  actual 
objective  value  and  fraction  of  variables  set  to 
an  integer.  The  eighth  group  tries  to  estimate 
the  hardness  of  the  search  space  for  a  DPLL- 
type  solver.  For  that  we  run  DPLL  procedure 
to  a  small  depth  and  measure  the  number  of 
unit  propagations  done  at  various  depths.  We 
also  estimate  the  size  of  the  search  space  [4]  by 
randomly  setting  variables  and  then  doing  unit 
propagation  until  a  contradiction  is  found.  Our 
final  group  of  features  uses  two  local  search  al¬ 
gorithms,  GSAT  and  SAPS  [6],  We  run  both  al¬ 
gorithms  many  times,  each  time  continuing  the 
search  trajectory  until  a  plateau  cannot  be  es¬ 
caped  within  a  given  number  of  steps.  We  then 
average  various  statistics  collected  during  each 
run.  It  is  interesting  to  note  that  local-search 
probing  features  are  important  to  the  models 
for  most  of  SATzilla’s  algorithms,  even  though 
all  of  these  solvers  are  DPLL-based. 
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